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Thick Dielectric Grating on Asymmetric Slab 
Waveguide 


By D. MARCUSE 
(Manuscript received September 13, 1976) 


We discuss an approximate theory of scattering losses of a guided 
mode in an asymmetric slab waveguide with a thick grating on one side. 
The theory is an extension of an exact theory of thick dielectric gratings 
published previously. The results of the theory are presented in 
graphical form. The coupling coefficient between two guided waves 
traveling in opposite directions is calculated and compared to per- 
turbation theory. 


I. INTRODUCTION 


Diffraction gratings deposited on top of a thin-film waveguide are 
useful as input and output couplers.!:2 A guided wave traveling in the 
thin-film waveguide is scattered out into the two regions (air and sub- 
strate) adjacent to the film as it encounters the region of the diffraction 
erating. Ordinarily, the power that is scattered out of the thin-film guide 
splits up into several grating lobes; the number and strength of these 
lobes depends on the grating period D, the depth of the grating 2a, and 
on the shape of its teeth, as shown in Fig. 1. The relationship between 
the propagation constant 6 of the guided wave, the index of refraction 
n; of the medium into which the grating lobe escapes at angle 6,,;, and 
the grating period length D is expressed by the following equation,? 


B — (2am/D) 


oe (1) 


cos 6,5 = 


329 


n3 SUBSTRATE 


Fig. 1—Thick grating on a thin-film guide. 


The integer m indicates the order of the grating lobe, m = 1 is the lobe 
of first order, m = 2, the lobe of second order, etc., and k = 27/) is the 
free-space propagation constant. As the magnitude of the right-hand 
side of this equation exceeds the value unity, the scattering angle be- 
comes Imaginary and there is no scattered wave. (The angle 6,,,; is mea- 
sured with respect to the surface of the thin-film guide.) Thus, it is ap- 
parent that no scattered wave can escape from the film into medium 1 
if D < 2x/(6 + n;k). For values of D that are just larger than the right- 
hand side of the inequality (with n; indicating the larger of the two re- 
fractive indices of the media adjacent to the film, the substrate say) a 
single grating lobe is radiated into the substrate. If D violates the in- 
equality, with n; being the refractive index of the air space (the region 
of lowest refractive index), a grating lobe is radiated into that region. If 
we let the value of D increase further, higher-order grating lobes begin 
to appear. | 

For purposes of coupling power from the outside into the film guide, 
a laser beam is directed at the grating and is aligned to coincide as closely 
as possible with one of the grating lobes. If only one grating lobe exists, 
it is possible to capture most of the laser power and have it trapped in 
the thin-film guide. However, if other grating lobes exist, the laser power 
is split between the guided wave in the film and the other grating lobes 
so that the coupling efficiency for excitation of the trapped modes is 
reduced. It may be inconvenient to design a grating with only one lobe 
since this requires a grating with a very short period and also necessitates 
excitation of the thin-film guide through the substrate. For this reason, 
it is desirable to be able to control the amount of power radiated into 
undesirable grating lobes by shaping the form of the grating, that is, its 
teeth, in an appropriate way. Gratings with specially shaped teeth are 
known as blazed gratings.’ An analysis of blazed gratings cannot be 
performed by using first-order perturbation theory because the grating, 
to be effective, must be thicker than is compatible with perturbation 
theory. 

This paper proposes a new method of calculating grating responses 
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by an approximate method that, nevertheless, allows us to compute the 
response of thick gratings without having to search for the complex roots 
of a large determinant. Our approach is based on the exact grating theory 
described in Ref. 1. This exact theory is limited to TE waves (not an in- 
herent limitation but one of convenience) and is applied to a grating 
defined as the boundary between two dielectric half spaces. A plane wave 
is incident from one side. The electromagnetic field outside of the grating 
is described as a superposition of infinitely many plane waves, most of 
which have propagation constants with one imaginary component. The 
field in the grating region is expressed as a double Fourier series. The 
unknown expansion coefficients are determined by matching of 
boundary conditions, not along the grating surface but along hypo- 
thetical planes adjacent to the grating. 

This approach can easily be extended to the description of a grating 
on one side of a thin-film guide simply by adding the thin film to the 
structure and postulating plane waves in the film region. However, there 
is an important difference between the simple-grating and the wave- 
suide-grating problems. The scattering problem of the grating between 
two half spaces is completely determined by the incident wave so that 
the amplitude coefficients of all the other waves can be obtained from 
an inhomogeneous equation system. The waveguide grating problem, 
on the other hand, leads to a homogeneous equation system. The dis- 
tinction occurs because it is no longer possible to specify the direction 
of the incident wave, which is now the upward (or downward) traveling 
part of the standing wave pattern of the guided mode whose propagation 
constant is not known. In fact, the complex propagation constant would 
now be obtained as the solution of a determinantal eigenvalue equation.” 
However, the search for the complex solutions of a large determinantal 
equation is costly and time consuming and offsets the advantage of the 
original grating calculation. | 

To circumvent this problem, we propose a different approach. It is 
true that the exact eigenvalue of the determinantal equation is complex, 
but we know a priori that the real part of this complex solution, the 
propagation constant, is far larger than the imaginary part, the loss 
coefficient. This observation gives us confidence that it should be pos- 
sible to determine the loss coefficient by computing the amount of ra- 
diated power once the problem has been approximately solved. The real 
part of the complex eigenvalue can be obtained by an approximation that 
is based on results obtained from the simpler grating theory described 
in Ref. 1. We have shown that the effective plane of reflection of the in- 
cident plane wave can be computed approximately by means of the WKB 
method. The comparison of the effective penetration depth computed 
from the WKB approximation with the exact result showed that the 
agreement was reasonable. We found that the penetration depth of the 
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wave was approximately given by the formula! 


2 
dy = rn ee — > ke arctan ve (2) 
_ We use the definition 
= nik? — 6 (3) 
and 
16 = 6? — n5k. | (4). 


Figure 1 shows that 2a is the depth of the grating, n,; the refractive index 
of the thin film, and nz represents the index of the medium above the 
film. The WKB solution that led to (2) fails (in the form used by us) as 
the grating becomes too thin. For this reason, we use as the penetration 
depth (see Figs. 12 and 13 of Ref. 1) 


_ doifdg<a 


d 
. aifdg>a 


(5) 
The information gathered from Ref. 1 thus allows us to define an effec- 
tive film thickness d err (see Fig. 1) as 

d ore =d+ dy (6) 


and thus enables us to calculate iteratively the propagation constant 8 
from (3), (4), and4 


6 
Koder¢ = vw + arctan - + arctan 2 (7) 
Ko Ko 
with 
iB = 62 — nk? (8) 


(v is the mode number of the guided wave; v = 0 for the lowest order TE 
mode.) The refractive index n3 is the index of the medium on the other 
side of the film opposite the medium with index no. 

Once the propagation constant of the guided mode is approximately 
known, we fix the value of that component of the standing wave inside 
of the thin film that approaches the grating and we solve the inhomo- 
geneous equation system that results. It is clear that this equation system 
cannot provide an exact solution since we have already frozen the value 
of the propagation constant and have specified one of the two amplitudes 
associated with the guided wave to the right-hand side of the equation 
system, changing a homogeneous to an inhomogeneous equation system. 
However, we have checked that our approach gives precisely the same 
results as first-order perturbation theory for small values of the grating 
depth 2a. Furthermore, the results obtained from this approximation 
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Fig. 2—The coordinate system in relation to the grating and the film. 


agree well (in cases where agreement is to be expected) with the results 
of the exact grating theory. However, our present theory does not provide 
correct answers for the amplitudes of scattered waves inside the thin film 
that coincide with another guided mode. In this case, a “resonance” 
occurs and the results become meaningless. Coupling among guided 
modes thus cannot be handled by this theory and must be treated dif- 
ferently, as will be described later. 


ll. MATHEMATICAL FORMULATION OF THE PROBLEM 


We use the following representation for the electric field! of the 
structure shown in Fig. 2: 


Ey = exp (—16z) > Cm €Xp (—lpmx) exp (i — mz) 


m=—o@ 


forx =2a (9) 


Ey = exp (—i8z) > Bnm exp [(im/b) nx] exp (i > m2) 


nn=—o0 


for0O <x S2a_ (10) 


Ey = exp (-i6z) {AW exp (~ikmx) 


mM=— © 


: —7 
+ AW exp (ikmx)} exp (i a mz ) forO=x2z-d (11) 


Ey = exp (-162z) 2 Dy», exp (tamx) exp (i = mz ) 
forx =< —d. (12) 
We define 
bm = B- =m (13) 
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and express the parameters appearing in (9) through (12) as follows: 


p2, = nsk? — 62, (14) 
Ki, = nik? — Bin (15) 
o?, = nzk? — B?,. (16) 


The parameter b appearing in (10) is an arbitrary constant that should 
be larger than a. For our numerical evaluations we have used b = 
av’2. Equations (9), (11), and (12) are solutions of the wave equation 
but (10) is not. We solve our problem by substituting (10) into the wave 
equation, obtaining a set of equations for the determination of Bym. 
However, these equations do not determine B,,,, completely; in addition, 

we must satisfy boundary conditions by requiring that E, and dE,/dx 
~ remain continuous at x = 2a, x = 0, and x = —d. All these conditions lead 
to the following set of equation systems 


foe) . / 2 
x [Nar—nm—m'= | (=) + Bo |Mandmmn’| Bum’ = 0 


n’,m’=—o b 
(17) 
3 (om +57) Bnm exp (iF na) = 0 (18) 
va Ww 
1 6S 
ZH ( pr) 
Km 


— Am om oxy (—2ixpd) (1 Ee =n) | Bnm=0 form #0. (19) 


Ky On Kn 


If we remove the restriction m ~ 0 from (19), the combined equation 
system (17) through (19) would represent the exact formulation of our 
problem. However, since this would force us to solve the determinantal 
eigenvalue equation for complex 6, we exclude the equation with m = 
0 from (19) and add instead the following inhomogeneous equation to 
our set 


ZL Ose") 


4+ 20 0 ep (—2ixod) (1 real n) | Bre = 4A‘) (20) 
Ko 00 KoD 

The equation system (17) stems from the substitution of (10) into the 

wave equation. The coefficients M,,,, and Ny—n,m’—m are defined in Ref. 

1. Equations (18), (19), and (20) result from the boundary conditions. 

In fact, the left-most term in parenthesis in (20) as well as the term with 

the exponential function are each individually equal to 2A§*). Equation 
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(20) is (twice) the arithmetic mean of these two equations and (19) (with 
A“) instead of AS") is their difference. We took the arithmetic mean of 
two equations, each expressing the relation between A$"? and B,,,, to 
improve the accuracy of the approximation. The difference must, of 
course, be taken to eliminate A‘* from the exact equation system. 

Equations (17) through (20) allow us to express B, in terms of A$"). 
For purposes of normalization, we express the amplitude coefficient A” 
in terms of the power P carried by the guided mode, 


Aj’) = (21) 


B(den + = + ) 
Yo 90 


Finally, we need the amplitude coefficients of the scattered waves which 
may be expressed in terms of B, as follows, 


Cm = 3 Bnm eXp 1 (Pm + n) 2a (22) 


n=—o 


and 


co 


Dm exp (-tomd) = > 


n=—o 


- ; T ; ‘ 
Km (Km COS Kmd + lom sin k,d) — b N(om COS Ky d + lkm SIN Kya) Bnm 


Kin — Om 


(23) 


Knowing the amplitudes of all scattered waves, we can calculate the 
power that is carried away from the thin-film waveguide. We use the 
partial power attenuation coefficients 


. bal Cnl 


2 = 24 
A2m 20 oP (24) 
and 
om|Din|? 

2 = 25 
A3m Dus eP ( ) 

and obtain the total power attenuation coefficient as the sum 
2a = >) (2aam + 2agm), (26) 

m 


where the summation extends over all real, propagating waves. 
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Ill. COUPLING COEFFICIENT BETWEEN GUIDED MODES 


We are interested in finding the coupling coefficient for power transfer 
from the incident guided wave to its backward traveling counterpart. 
This is an important design parameter for distributed feedback lasers. 
The exact solution of our problem would give us this coefficient because 
we would know the amplitude coefficients of all the waves whether 
guided or not. Our approximate procedure fails if a principal grating lobe 
scatters power into the direction corresponding to another guided mode. 
For this reason, we use a different approach. If we want to couple the 
incident guided mode to the backward traveling mode via the first 
grating order, we need a grating period that is given by the formula,° 


Tw 
D=-—: (26) 
6 
A grating with such a short period does not scatter power out of the 
thin-film guide. We only need to know the amount of power scattered 
per unit length into the opposite direction. If the amplitude coefficient 
of this backward scattered wave is A{~, the coupling coefficient is defined 
as® 


KO Ay? 
Se ee (27) 
28 (dere + (1/yo) + (1/60)) AS” 
To first order of perturbation theory, we obtain from (27) 
2 
7 (28) 


R = ——_—___+_____~-q, 
2B(dert + (1/0) + (1/50) 


The factor a, is the Fourier coefficient of the spatial frequency compo- 
nent 27/D of the grating function. For our triangular grating shape, we 
have 
2aD 2 ioe dD, 

- 71D — DD, sin 7 D (29) 
We have stated the result of perturbation theory only for comparison 
purposes. We evaluate the coupling coefficient from (27) by calculating 
A” with the help of the exact grating theory developed in Ref. 1. 

The simple, exact grating theory can be used to approximate wave- 
guide losses by assuming that all waves that are scattered at the grating 
penetrate through the thin-film boundaries without any further re- 
flection. We shall see that this assumption yields good results if the 
grating is on the side of the film with the greater index difference (the 
air side). For gratings on the substrate side, reflections from the opposite 
film boundary are important and the simple-minded approach yields 
unsatisfactory results. However, it is interesting to compare the results 
of the approximate theory presented here with loss calculations based 


Q) 
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on the simple grating, since such a comparison can tell us when we can 
use the results of the simple grating theory directly and when we need 
the more sophisticated (if approximate) approach presented in this 
paper. A comparison of the two theories is also useful to give us confi- 
dence in the results of the approximate theory. 

The partial waveguide losses can be computed from the simple grating 
theory by using (24) unchanged (except for the fact that the simple 
grating theory of Ref. 1 is now used to compute C,,,) and by replacing D,, 
in (25) with A,, obtained from (16) of Ref. 1. 

A discussion of the number of terms used in the series expansion of 
the field was given in Ref. 1. 


IV. DISCUSSION OF RESULTS 


Careful comparison of the results of our present theory with the per- 
turbation theory’ shows perfect agreement for small values of the grating 
depth 2a. It is, of course, necessary to replace the amplitude of the si- 
nusoidal grating (designated as o in Ref. 7) with the Fourier amplitudes 
coefficient (29). 

To show the difference of the scattering losses that result from using 
the present waveguide theory and to compare it to the simple grating 
theory, we have drawn in Fig. 3 the partial scattering loss of the first 
grating lobe for a grating with vanishingly small depth 2a. The curves 
in this and subsequent figures are labeled accordingly. We normalize 
the loss coefficient by multiplying it with \°/a? to make it dimensionless 
and to reduce its dependence on a. To first order of perturbation theory 
the normalized attenuation coefficient should be independent of a. 

The independent variable on the horizontal axis of all our figures is 
the scattering angle ¢ = 90 — 413 [see eq. (1)] of the first-order beam 
(m = 1), the wave corresponding to this angle escapes into the medium 
with the higher refractive index n3. The angle ¢ is varied by varying the 
grating period D. 

This practice of using the scattering angle of the substrate beam as 
the independent variable and defining it with respect to the direction 
normal to the film surface is taken from Ref. 7. Figure 3 and all subse- 
quent figures use n; = 1.59, ng = 1.0, and n3 = 1.458 (in some later fig- 
ures, No and nz will be interchanged). Furthermore, we use d = degr = 
0.571; this choice was made to compare waveguides having the same 
effective width. Figure 3 applies to a symmetrical grating with D;/D = 
0.5. It is apparent how very similar the results of the two approaches are. 
The air beam disappears at an angle of 43.3°, because we have labeled 
all beams with the angle of the beam in the substrate and the angle of 
the air beam is related to the angle in the substrate by Snell’s law. 

A departure from the results obtained using the waveguide theory and 
the result calculated from the simple grating theory is discernible only 
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Fig. 3—Comparison of the waveguide grating theory with the simple grating theory for 
a symmetrical grating of vanishing depth (2a/ — 0). Shown is the normalized scattering 
loss coefficient of the first-order substrate beam. Film index n; = 1.59, air index nz = 1, 
substrate index ng = 1.458. 


at beam angles that correspond to beams that nearly graze the surface. 
At these angles, reflection from the film-substrate interface becomes 
noticeable and indicates the difference in the solid and dotted curves. 
In particular, we see that the substrate beam, expressed by the solid line, 
vanishes at ¢ = +90°, whereas the dotted line remains at a finite value. 
This difference is caused by the fact that the substrate beam goes over 
into a guided mode in the waveguide case, but in the simple grating, 
where no guided modes exist, the scattering angle in the film can become 
still larger so that there is no discontinuity at the point where the actual 


338 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 


d/A=0.4 
det¢/A = 0.571 
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= 





—100 —80 


Fig. 4—Substrate beams: comparison of perturbation theory (2a/X — 0) and thick 
grating waveguide theory (2a/\ = 0.5) for asymmetrical grating on the air side of the film. 
The first-order substrate beam loss is shown. 


substrate beam vanishes. However, the angle of the simple grating re- 
sponse has been adjusted by Snell’s law to correspond not to the film but 
to the substrate angle, even though reflection at this interface does not 
exist in case of the simple grating. 

Figure 4 provides a comparison between perturbation theory 
(2a/ — 0) and the first-order grating response for a grating on a thin- 
film waveguide with thickness 2a/\ = 0.5. We have used a film thickness 
of d/\ = 0.4, but the thick grating increases the effective film thickness 
to dere/A = 0.571. To have a meaningful comparison, we have used this 
film thickness also for the case a — 0. Figure 4 shows clearly that per- 
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turbation theory overestimates the scattering losses of thick gratings. 
However, for ¢ near £90°, the agreement between perturbation theory 
and the more precise theory is still remarkably close. This seems to be 
a general tendency which we shall encounter again. Figure 4 holds for 
the substrate beam while Fig. 5 shows a comparison between perturba- 
tion theory and the more precise theory for the air beam. Figure 6 applies 
to the same case, i.e., a symmetrical grating on the air side of the film, 
and shows the total scattering loss (power carried away by all grating 
orders in both media) as the solid line and compares it with the power 
carried away by the first-order grating response in the substrate indi- 
cated by the dotted line. The difference between the total amount of 
scattered power and the power in the first-order substrate beam is made 


d/A=0.4 


des /A= 0.571 





—80 —60 —40 —20 0 20 40 60 80 
Q 


Fig. 5—Air beams: comparison of first-order perturbation theory (2a/\ = 0) with thick 
grating theory (2a/d = 0.5) for the air beam with grating on air-film interface. 
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Fig. 6—The total loss is compared to the loss caused by the first-order substrate beam 
for 2a/\ = 0.5 and a symmetrical grating on the air-film interface. 


up partly by the power carried by the first-order air beam and partly by 
all the other grating orders. As the angle ¢ increases, more and more 
grating lobes appear. Rather than show each grating order separately 
we have added them all and have presented the total loss. The curve 
representing the total loss does not go to zero at ¢ = 90°, because the 
grating responses of higher order do not vanish as the first-order sub- 
strate beam disappears inside of the thin film. 

Fig. 7 shows the scattering losses of an asymmetric grating on the air 
side of the thin film with D,/D = 0. We have included the total scattering 
loss as the topmost solid line, the first-order substrate beam as the dotted 
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line, and the first-order air beam as the lowest solid line. The most 
conspicuous aspect of this figure is the fact that so much more power is 
carried by the first-order substrate beam compared to the first-order 
air beam. The grating asymmetry is responsible for preferential scat- 
tering into the substrate. Comparison of Figs. 4 and 5 shows that the 
symmetrical grating scatters roughly equal amounts of power into air 
and substrate in the angular range where both beams exist simulta- 
neously. Fig. 7 shows that a relatively small amount of power is scattered 
into higher-order grating modes, because the curve for the first-order 
substrate beam does not lie far below the total loss curve. 
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Fig. 7—Total loss, first-order substrate beam, and first-order air beam loss for an 
asymmetrical grating with D,;/D = 0 and 2a/d = 0.5. 
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Fig. 8—Same comparison as in Fig. 7 for an asymmetrical grating with D,/D = 1. 


Figure 8 applies to a grating with the opposite asymmetry, 
D,/D = 1. The total loss is the same as in Fig. 7 but the roles of substrate 
and air beams have been interchanged in the range —43° < ¢ < 48°. For 
angles below this range, the substrate beam is identical to the corre- 
sponding beam for the grating with the opposite symmetry. For ¢ > 48°, 
the substrate beam carries again significantly higher power than inside 
the range —43° < ¢ < 438° but higher-order modes now carry far more 
power at angles ¢ > 48° than in Fig. 7. An explanation of the influence 
of the grating shape in terms of geometrical optics is given in Ref. 1. 

We have compared the results of the waveguide grating theory with 
the simple grating theory in Fig. 3 for the case of very thin gratings. 
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Figures 9 and 10 show such a comparison for a thick, asymmetric grating 
with 2a/A = 0.5 and D,/D = 1. We see that we can obtain most of the 
scattering information from the simple grating theory. The two curves 
depart significantly only near the ends of the angular range of the sub- 
strate beam. 

So far we have considered only gratings on the air side of the thin film. 
The next six figures apply to gratings on the substrate side of the film. 
We obtain these results from our theory simply by interchanging the 
roles of ng and nz, with the values n; = 1.59, no = 1.458, and n3 = 1.0. For 
a deep grating with 2a/ = 0.5 and d/\ = 0.4, we now obtain a very 
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Fig. 9—Substrate beams: comparison of the grating guide theory with the simple grating 


theory for an asymmetrical grating with D;/D = 1 for 2a/\ = 0.5. The partial loss coefficient 
for the first-order substrate beam is shown. 
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Fig. 10—Same as Fig. 9 for first-order air beams. 


slightly different effective film thickness of des/A = 0.569. Figure 11 
shows a comparison of perturbation theory (2a/\ — 0) and thick grating 
theory for 2a/d = 0.5 for the first-order substrate beam for a symmetrical 
grating with D,/D = 0.5. This figure should be compared with Fig. 4, 
because both cases are similar with the only difference being that the 
grating is now on the opposite side of the thin film. The thick grating 
theory is now in much closer agreement with perturbation theory, but 
both theories show a markedly different behaviour from the curves in 
Fig. 4, since there is obviously far more interference between the direct 
beam and the component that is reflected only once at the air-film in- 
terface. The deeper nulls discernible in the thick grating theory (dotted 
lines in Fig. 11) are caused by the fact that a slight shift has occurred that 
places the regions of destructive interference at angles where total in- 
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ternal reflection occurs at the air-film interface. Figure 12 shows a similar 
comparison for the first-order air beam for the same grating geometry. 
This figure should be compared with Fig. 5. Figure 12 is quite similar in 
shape to Fig. 5, but the curves are much lower, showing that air beam 
scattering is weaker if the grating is on the substrate side of the film. 
There are no pronounced interference effects, because the reflection from 
the film-substrate interface is much weaker. The dotted line in Fig. 12 
labeled grating theory was computed with the help of the simple grating 
theory and shows remarkably close agreement with the grating guide 
theory. 
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Fig. 11—Substrate beams: grating on film-substrate interface. Comparison between 
perturbation theory and thick waveguide grating theory (2a/\ = 0.5) for a symmetrical 
grating, D,/D = 0.5 for first-order substrate beams. Note the deep interference nulls. 
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Fig. 12—Same as Fig. 11 showing the first-order air beams. 


Figure 13 compares the total loss to the loss associated with power 
carried away by the first-order substrate beam for a thick grating with 
2a/d = 0.5. 

Figure 14 compares the theory of the simple grating with the wave- 
guide grating theory for the first-order substrate beam for a thin grating 
(2a/ — 0) at the film-substrate interface. We see that the simple grating 
theory does not always suffice to predict the performance of a thin-film 
waveguide with a diffraction grating. The simple grating theory predicts 
the average loss correctly, but fails completely to account for interference 
effects. This figure should be compared with Fig. 3. The comparison 
shows that the simple theory is quite useful as long as interference effects 
between a direct and a reflected beam can be neglected, as in the case 
of the grating on the film-air interface (Fig. 3). For a grating on the 
film-substrate interface (Fig. 14), the simple grating theory is not ap- 
plicable to the waveguide case. Figure 15 shows the comparison of the 
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two theories for a thick grating with 2a/d = 0.5 for the first-order sub- 
strate beam, whereas Fig. 16 compares the corresponding first-order 
beam in the air space. Just as in Fig. 12, the simple grating theory gives 
a good description of scattering for the air beam even if the grating is 
thick and is located on the film-substrate interface. 

The last figure, Fig. 17, shows the normalized coupling coefficient R 
(multiplied by \2/a) as a function of the normalized grating thickness 
2a/n for gratings on the film-air interface (solid lines) and on the film- 
substrate interface (dotted lines) for symmetric (D,;/D = 0.5) and 
asymmetric gratings (D,;/D = 0 and 1). The two extreme cases of 
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Fig. 183—Comparison of total loss and partial first-order substrate beam loss for a 
symmetrical grating on the film-substrate interface. 
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Fig. 14—-Substrate beanis: comparison of the waveguide grating theory and the simple 
grating theory for a thin grating (2a/\ — 0) on the film-substrate interface. 


asymmetry give exactly the same results. At 2a/\ = 0 the curves agree, 
of course, with the perturbation theory (28). The most remarkable fact 
about the curves of Fig. 17 is their slight departure from the prediction 
of perturbation theory. Corresponding curves drawn from perturbation 
theory would be straight horizontal lines coinciding with our curves at 
2a/ = 0. The exaggerated scale of the figure shows the downward slope 
for increasing grating thickness, but even for a grating whose thickness 
is equal to the vacuum wavelength of the wave, the results of the thick 
grating theory differ from perturbation theory by no more than 30%. This 
result is in agreement with the general trend that we observed in Fig. 4, 
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where we saw that the thick grating theory is in close agreement with 
perturbation theory near ¢ = —90°. Coupling between a forward and 
backward traveling guided mode is an extreme case of backward sub- 
strate scattering, except that the beam does not escape into the substrate 
but is trapped in the film by total internal reflection. Figure 4 shows 
clearly how much perturbation theory and thick grating theory can differ 
at scattering angles that are more nearly normal to the film surface. 
Figure 17 thus shows that the perturbation formula (28) is remarkably 
accurate even for rather thick gratings whose thickness approaches the 
vacuum wavelength of the light wave. The curves in Fig. 17 were com- 


d/\ = 0.4 
dots/A= 0.569 


m= 1 


2a/A= 0.5 


-~WAVEGUIDE 
THEORY 


_GRATING 
“THEORY 





Fig. 15—Same as Fig. 14 for thick grating with 2a/A = 0.5. 
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Fig. 16—Same as Fig. 15 showing the partial loss coefficients for first-order air beam 
scattering. 


puted from (27), where the scattered wave amplitude A{” is obtained 
from the simple grating theory.! 


V. CONCLUSIONS 


We have presented an approximate theory for scattering of power from 
a guided thin-film mode into the areas above and below the film by a 
thick diffraction grating deposited on one side. This theory has been 
compared with perturbation theory’ and with the results of the exact, 
simple grating theory for a grating between two dielectric half spaces, 
and good agreement has been obtained in all cases where agreement can 
be expected. We are confident that our theory yields reasonable results 
for light scattering out of a thin film. 

However, this theory does not give correct answers if applied to cou- 
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pling between two guided modes, even in the limit of very thin gratings 
where the correct answer is known from perturbation theory. This failure 
of the theory in the case of coupling among guided modes is under- 
standable when we realize that a guided mode is at transverse resonance 
in the thin-film guide. The naive theory, that is based on the assumption 
that the mode amplitudes remain constant along the thin film, cannot 
account for a resonant situation where the power exchange may be 
complete and where mode coupling leads to new normal modes of the 
structure. On the other hand, it does not seem to hurt the calculation 
of the radiation loss coefficients if a minor grating lobe accidentally 
scatters power into the direction of a guided-film mode. Such “reso- 
nances” do occur, for example, over the angular range shown in Fig. 3 
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Fig. 17—Coupling coefficients between forward and backward guided mode. The solid 
lines hold for a grating on the film-air interface, the dotted lines describe a grating on the 
film-substrate interface. The curves for D;/D = 0 and D,/D = 1 are identical. 
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and the excellent agreement with perturbation theory and with the 
simple exact grating theory indicates that no difficulties have oc- 
curred. 

Our theory has helped to clarify the areas where the simple grating 
theory! may be used to predict scattering losses even for film waveguides, 
but it has also shown that the simple grating theory does not help to 
predict waveguide losses if strong interference between a direct and a 
reflected beam may occur.? 

Finally, we have used the simple grating theory to compute the cou- 
pling coefficient between two guided modes traveling in opposite di- 
rections. We found that perturbation theory holds approximately over 
a surprisingly wide range of grating thicknesses. Coupling between modes 
other than forward and backward modes could be treated very similar- 
ly. 

Our approximate waveguide grating theory has the advantage of al- 
lowing direct calculations of power scattering without the need for a 
search routine for finding the complex roots of a large determinantal 
equation. It is, thus, a cheap and fast method for calculating the scat- 
tering properties of thick gratings on thin-film waveguides. 
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Photon Probe—An Optical-Fiber Time-Domain 
Reflectometer 


By S. D. PERSONICK 
(Manuscript received May 14, 1976) 


This paper describes an optical time-domain reflectometer that 
incorporates a gated photomultiplier receiver. The instrument can 
detect extremely weak reflections from fiber breaks (more than 65 dB 
below the 4-percent reflection of a perfect break) with 0.5-m distance 
resolution. In addition, backward Rayleigh scattering, which occurs 
roughly uniformly along a fiber, can be used to estimate the attenuation 
vs position within a fiber. Therefore, regions of high attenuation can 
be located nondestructively from one end of the fiber. 


l. INTRODUCTION 


Time-domain reflectometers for locating breaks in optical fibers have 
been implemented by a number of researchers!:*._ Typically, these in- 
struments have incorporated GaAs injection lasers to produce pulses 
of light having a high peak power and a narrow width and have incor- 
porated avalanche photodiode (APD) receivers. The present instrument 
incorporates a gated photomultiplier receiver, specifically designed for 
this application. The photomultiplier allows increased sensitivity 
compared to the APD receivers and its gating action allows the user to 
“look behind” large reflections that otherwise would cause undesirable 
saturation of the receiver. 

The instrument is capable of detecting echos from breaks or imper- 
fections that are 65 dB below the 4-percent echo from a perfect break. 
The distance resolution is 0.5 m.* By using a transmitted pulse that is 
wider or narrower (the present pulse is 5 ns in duration), a trade-off 
between sensitivity and resolution can be made. 

In addition, the backward Rayleigh scattering, which occurs roughly 
uniformly along the fiber, can be used to estimate the loss as a function 


* The delay per unit length of light in glass is 5 ns/m. The 5-ns full-width transmitted 
pulse gives a delay resolution of 0.5 m without special signal processing. 
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Fig. 1—Optical time-domain reflectometer. 


of position along the fiber (see Section IV for a clarification of Rayleigh 
scattering). Thus, the loss uniformity can be estimated and regions of 
high loss can be identified nondestructively from one end of the fiber. 


ll. SYSTEM DESCRIPTION 


The basic system is shown in Fig. 1. The light source is a large optical 
cavity, 8250-A injection laser driven by an avalanche transistor. Drivers 
of this type have been described in the literature.* Pulse widths below 
150 ps can be obtained, but the present system operates with a 5-ns 
transmitted pulse width. This was chosen as a compromise between 
resolution in the arrival time of the echo returns and pulse energy. The 
pulse-repetition rate is 5 KPPS. The laser output is collimated by a lens 
and passes through a beam splitter that serves as a directional coupler. 
The beam is then focused onto the fiber to be measured. (If only one end 
of the test fiber is available, the system is aligned with a short “pigtail” 
fiber of the same type as the test fiber; the test fiber is then spliced on 
to the pigtail.) 

Kchoes from the fiber (including echoes from the front face or from 
splices used to attach the test fiber to the pigtail) are directed by the 
beam splitter to the cathode of a gated electrostatic photomultiplier. The 
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photomultiplier cathode has a quantum efficiency of about 8 percent 
at this wavelength. It is mounted in a thermoelectrically cooled housing 
to minimize dark current and maximize cathode life. The multiplication 
factor of the tube is about 2 X 10°. It is quantum noise limited when in- 
terfaced with commercial 50-ohm amplifiers. The tube has two features 
incorporated specifically for this application. The dynode chain draws 
current from a high-impedance divider network that limits the dynode 
current under high-light-level conditions. This minimizes the chances 
of damage due to light overload or abuse. In addition, the tube can be 
gated off by raising the cathode potential approximately 1000 V above 
its nominal —4500 V potential. This gating feature eliminates the satu- 
ration, caused by strong nearby echoes, that is present when using APD 
or PIN diode receivers. 

The high-voltage gate pulses are obtained from a gate generator rated 
at 55-ns rise and fall times with a 30-pF, purely capacitive load. This is 
consistent with the performance measured with the present instrument. 
In this system, the photomultiplier turns on when the gate pulse turns 
off. The gate-pulse width is adjustable from 750 ns to 100 ms. Using the 
adjustable delays available on the generator, the gate pulse can be po- 
sitioned to turn the tube on immediately after any undesired echo has 
arrived. 


lil, PERFORMANCE 


Measurements of reflections (echo scans) using the optical TDR 
(“photon probe”) are shown in Figs. 2 through 11. 

Figure 2 shows the echo scan for a short fiber 70 m long. The fiber 
round-trip loss is negligible. The fiber numerical aperture (NA) is about 
0.2. In the figure, we can see the saturated front-face reflection (no gating 
is applied) and the saturated back-face reflection, both of which appear 
as large exponentially decaying pulses. The saturation effects shown can 
be reduced somewhat by using a different amplifier following the pho- 
tomultiplier. However, experiments reveal that without gating there is 
a long-term reduction (approximately 3 dB) of the photomultiplier gain 
which persists for tens of microseconds after overload of the tube with 
a large echo. When gating is used, this leng-term reduction of the gain 
is eliminated. (See also the discussion of Figs. 6 through 8.) This long- 
term gain reduction is associated with the time constants of the photo- 
multiplier burnout-prevention circuit. A double-round-trip reflection 
can also be seen at about 28 dB down (4 percent by 4 percent) from the 
back-face echo. 

Figure 3 shows the echo scan for the 70-m fiber with gating of the 
front-face echo. The gating voltage was adjusted to leave a small remnant 
of the front-face echo, although complete gating is possible. The delay 
between the front-face echo remnant and the onset of Rayleigh-scat- 
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Fig. 2—Echo scan for a 70-m optical fiber. 


tering reflections (the noise-like part of the trace) represents the reso- 
lution of the gating mechanism. The upper trace represents the position 
of the falling edge of the gating pulse. Fluctuations on the baseline of the 
Rayleigh scattering are due to pick up from the high-voltage gate gen- 





Fig. 3—Kcho scan for a 70-m optical fiber with gating of front-face echo. 
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Fig. 4—Kcho sean for a 70-m optical fiber with front- and back-face echoes gated 
out. 


erator. Since the round-trip loss in this short fiber is much less than 1 
dB, there is no decay in the Rayleigh scattering trace as a function of 
delay (distance along the fiber). 

Figure 4 shows the echo scan for the 70-m fiber as above with front- 
and back-face echoes gated out. Note that there is no noise-like trace 
following the back-face echo. This is one confirmation that the noise-like 
trace is Rayleigh scattering and not dark current or a residue of the 
front-face echo. 

Figure 5 shows the echo scan for the 70-m fiber with no gating and with 
65 dB of optical pad between the beam splitter and the photomultiplier. 
The front- and back-face reflections are weak but visible. 

Figure 6 shows the echo scan for a 600-m low-loss fiber with a 0.2 NA. 
No gating is applied. The amplifier following the photomultiplier in this 
measurement recovers quickly from overload, so the saturation effects 
due to the overload of the front-face echo are not as obvious. The decay 
in the Rayleigh scattering part of the trace is due to fiber attenuation 
vs length. 

Figure 7 shows the same trace as Fig. 6 after boxcar averaging with a 
0.5-s integration time and a 50-s total sweep time. No gating Is ap- 
plied. 

Figure 8 shows the same trace as Fig. 7, but with gating of the front- 
face echo. The fine structure on this trace is system noise (laser pulse- 
amplitude fluctuations), and is not indicative of fiber details. However, 
the rate of decay of the roll off of the Rayleigh scattering and its shape 
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Fig. 5—Echo scan for a 70-m optical fiber with no gating and 65-dB optical pad. 


are measures of the fiber loss and uniformity. With improvements in the 

laser pulser and the signal-processing technique, it is anticipated that 
fine structure in the loss vs length dependence will be obtainable. 

Figure 9 shows the first attempt to use this instrument to analyze a 





Fig. 6—Echo scan for a 600-m optical fiber before boxcar averaging. 
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Fig. 7—Echo scan for a 600-m optical fiber after boxcar averaging. 


fiber with unexplained high loss. An echo scan was obtainable from only 
one end and is shown before averaging. The total loss was stated as 20.8 
dB, but no end-to-end transmission through the fiber was obtainable. 

Figure 10 is the boxcar integrated version of Fig. 9 including front-face 





Fig. 8—Echo scan for a 600-m fiber after boxcar averaging and with gating of front-face 
echo. 
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Fig. 9—First analysis of optical fiber with unexplained high loss. 


echo gating. The integrating time is 0.5 s and the total sweep time is 
about 210s. We see that the fiber has a rapid decay in Rayleigh scattering 
beginning at about 14.6-us delay (obtained from the time scale on Fig. 
9). Also, no back-face reflection is seen (compare Fig. 10 with Fig. 8). The 





Fig. 10—Boxcar averaged version of echo scan of Fig. 9. 
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Fig. 11—Boxcar averaged scan of high-loss optical fiber after removal of lossy sec- 
tion. 


fiber was stated as being 1500 m in length. The round-trip delay is about 
10 ns/m. From this we deduce that most of the high fiber loss is con- 
centrated in the last 50 m of one end of this fiber. 

Figure 11 is the boxcar averaged trace of the high-loss fiber after re- 
moval of 120 m from the lossy end. A back-face echo is now visible. The 
fiber was apparently about 1570 m long. Note that not all of the high-loss 
region has been removed. 

Figure 12 (upper trace) is a repeat of Fig. 11 using a logarithmic am- 
plifier. The lower trace is a repeat of the upper trace with 3 dB of optical 
pad placed in front of the photomultiplier. This calibrates the loss-vs- 
length measurement (the spacing between the curves is 1.5 dB of one- 
way loss) and verified the accuracy of the logarithmic converter. The 
uniform loss-vs-length of the fiber can also be seen. 


IV. THEORETICAL RESULTS 


All of the fibers measured above were coated and wound loosely on 
a foam-plastic drum. No attempt was made to strip cladding modes, but 
this was done in the first few meters by the coating. A rough calculation 
of the expected level of Rayleigh backscattering was made as follows.* 


* Rayleigh scattering is caused by variations in the density or composition of the fiber 
material on a scale that is small compared to the light wavelength. In state-of-the-art 
low-loss fibers, this is the dominant loss mechanism at 0.8-um wavelength. 
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Fig. 12—Upper trace is same as Fig. 11 on logarithmic scale. Lower trace is same as upper 
trace with 3-dB added optical loss. 


If an impulse of light of energy 1 joule is launched into the fiber, the pulse 
energy at a distance / (meters) along the fiber will be 


E(l) = exp [—a(l)]_ (Goules), (1) 


where a(l) is the cumulative attenuation up to distance / in nepers. In 
the distance interval / + dl, the Rayleigh-scattered energy from the fiber 
will be 

dE,(l) = a, exp [—a(l)]dl (joules), (2) 


where a, is the Rayleigh-scattering loss in nepers per meter (assumed 
constant).' Of this scattered light, a fraction S will be recaptured trav- 
eling back down the fiber toward the transmitter.* At a time t = 2l/c, 
where c is the speed of light (m/s), this scattered return will arrive at the 
transmitter end with energy 


OE. echo = Sas exp [—2a(l)]dl (joules) (3) 


and will produce a response that has duration 2dl/c. Thus, the amplitude 
(power) of the scattered echo at time t is 


pepe es | -24 (5) (watts). (4) 





2 


* It is possible that the Rayleigh scattering from the fiber is not independent of position, 
and thus a, should be replaced by a, (/) in such cases. Also, there is a possibility of other 
(nonisotropic) types of scattering (e.g., Mie scattering from air bubbles) which do not have 
the same fractions, S, of recaptured light as Rayleigh scattering. Thus, in some cases S 
may be a function of / as well. 


364 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 


Thus, (4) represents the backscatter impulse response of the fiber. The 
fraction of scattered light recaptured is given by (assuming Rayleigh 
scattering is approximately isotropic) 
m(NA)? _ (NA)? 
Arn? An2 ’ 








S x (5) 
where NA = fiber numerical aperture (approximately 0.2 for the fibers 
used), n is the fiber index of refraction (approximately 1.5), and NA/n 
represents the half angle of the cone of captured rays. Thus, for the fibers 
used, S is approximately 0.005. For the fibers used in these experiments, 
a, is about 4 dB/km or 0.0009 nepers/m at 0.82-um wavelength. Thus, 
the backscatter impulse response is approximately 


t 
p(t) = 2.3 X 107®c exp | -24 (S) | 
If the transmitted pulse in an actual measurement has peak power Po 
and width W, the backscattered power vs time is* 


ct 
Potal backscatter = PoSa;We exp | -24 (5) 


ct 


~ Py X 2.3 X 10-8 exp } ~24 (5) 


for c=2xX108, W=5X 107-9 seconds. 


Thus, the backscatter power for a 5-ns transmitted pulse is about 43 dB 
below the 4-percent reflection of a perfect break. 

The backscatter can be increased by using a wider transmitted pulse 
with a corresponding loss of resolution. The above result is consistent 
with Fig. 3 where the backscattering can be compared to the double- 
round-trip reflection, which is roughly 28-dB down from a perfect break 
reflection. - | 


V. CONCLUSIONS 


Using the above apparatus, we can detect extremely weak reflections 
from fiber breaks and imperfections. In addition, backward Rayleigh 
scattering can be used to estimate the loss as a function of position within 
the fiber nondestructively from one end. 

The precise measurement of loss is a complicated process which re- 
quires great care to specify and control launching conditions and to 
obtain repeatability. The present apparatus is not intended as a sub- 


* This equation assumes that exp [—2a(ct/2)| is approximately constant for intervals of 
time ¢ of duration W (pulse width). Otherwise one must use the average value of 
exp [—2a(ct/2)] over the interval (t, t + W). 
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stitute method of precise loss measurements, although refined versions 
of this apparatus could possibly serve that purpose. However, using this 
instrument, loss uniformity can be determined and concentrated loss 
sections in a fiber or cable can be located. In addition, fiber breaks can 
be located. 
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Computer Displays Optically Superimposed on 
Input Devices 


By K. C. KNOWLTON 
(Manuscript received August 3, 1976) 


A set of pushbuttons on a console may appear to have computer- 
generated labels temporarily inscribed on them if the button set and 
computed display are optically combined, for example, by means of a 
semitransparent mirror. This combines the flexibility of light buttons 
with the tactile and kinesthetic feel of physical pushbuttons; it permits 
a user to interact more directly with a computer program, or a com- 
puter-mediated operation, in what subjectively becomes an intimately 
shared space. | 

A console of this design can serve alternately as a typewriter, com- 
puter terminal, text editor, telephone operator’s console, or com- 
puter-assisted instruction terminal. Each usage may have several 
modes of operation: training, verbose, abbreviated, and/or special- 
privilege. Switching from one mode or use to another is done by 
changing the software rather than hardware; each program controls 
in its own way the momentary details of visibility, position, label, sig- 
nificance, and function of all buttons. 

Several demonstrations are described, including a prototype of a 
proposed Traffic Service Position System (TSPS) console, and an in- 
teractive computer terminal resembling a Picturephone® set with a 
Touch-Tone® pad. Also suggested are combinations of computed dis- 
plays with x-y tablets and other input devices. 


In interactive use of computers, a large number of advantages result 
from virtually superimposing the computed display on an input device 
such as a two-dimensional array of pushbuttons.!? A display so arranged 
can be used effectively to label buttons or relabel them with new 
meanings; indeed the buttons themselves may seem to appear and dis- 
appear according to their momentary significance or nonsignificance 
to the program. The same composite console—display plus input de- 
vice—may have vastly different uses depending on the program that 
labels buttons and reacts to them. Thus combined are complete flexi- 
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Fig. 1—Basic arrangement for superimposing a computed display on a two-dimensional 
array of buttons. 


bility, normally associated with light buttons, and the tactile and kin- 
esthetic feel of physical buttons that move, as on a typewriter. A button 
set may thus “be” a typewriter, calculator, telephone operator’s console, 
computer-assisted instruction terminal, or music keyboard. An x-y tablet 
or other two-dimensional input device may likewise have a computed 
display superimposed on it. In all cases, the user enjoys a sense of close 
interaction with the computer in an intimately shared input-output 
space. 


1. BASIC PRINCIPLES 


A straightforward way of superimposing a display on a button set in- 
volves a semitransparent mirror, as illustrated schematically in Fig. 1. 
The user looks through the mirror and views his/her hand directly as it 
pushes buttons. The display—a television monitor in the illustration— 
and mirror are so arranged that the virtual image of displayed light 
buttons conforms in three-dimensional space exactly with the position 
of the physical keytops. Perceived spatial congruence is so precise that 
if the display is a bulging TV screen, then it is best for the button tops 
to conform to a convex envelope, as in Fig. 1, so that the central buttons 
do not seem too soft (i.e., so that the finger meets the physical button 
top at exactly the same depth as the image position). When displayed 
image, button set, and mirror are properly aligned, there is no parallax 
effect and bystanders perceive interactions exactly as the user does. In 
fact, the actual buttons need not be seen—it is best if they are painted 
a dull black so that they seem to disappear when the corresponding light 
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Fig. 2—(a) Laboratory setup for experimenting with displays superimposed on button 
tops. (b) Closeup of TV screen, mirror, and 12 by 10 set of buttons. 


button is extinguished. For proper hand-eye coordination, the user does 
want to see his/her hand; therefore, lighting from the side is useful. If 
it is strong enough, then the mirror used need be only slightly trans- 
mitting (say 10 percent) and may thus be highly reflecting (say 75 per- 
cent) to maintain high visibility of the virtual display. Ambient light is 
no problem except that strong room illumination, direct or reflected, 
should be kept off the display screen. 

The display, of course, needs to be generated upside down so that it 
appears right side up when viewed in the mirror. This is no fundamental 
problem except that one commonly imports or implements software in 
which the assumption of right-side-up generation (of alphabetic char- 
acters, for instance) may be embedded deep in the code. 

One curiosity of these systems is that the hand seems transparent to 
light buttons, since it does not intervene in the path of reflected light. 
We can read through our fingertip the current label of the button pushed, 
as well as see other buttons beneath the hand. This is not in the least 
confusing to a user who has been at the machine for a few seconds; on 
the contrary, it is definitely helpful not to have to remove your hand to 
see what’s beneath it. 

Figure 2a is a photo of one generally useful laboratory prototype for 
experimenting with usages of virtual pushbutton consoles; Fig. 2b is a 
closeup of display, mirror, and button set as seen from farther away and 
lower than the user’s normal head position. The computer used has 32K 
24-bit words of core storage; programming is done in FORTRAN and an 
assembly language. The display is a normal 525-line TV monitor, with 
separate red, green, and blue (RGB) inputs, refreshed 30 times per second 
by specially built hardware from a separate core memory that holds 3 
bits per picture cell.?+ (The displayed picture is only 496 lines of 528 
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pixels per line.) Each of the eight logical colors is program-definable to 
128 levels per primary. The button set is a 12-wide by 10-high array of 
pushbuttons, 44-in. square on 1-in. centers, each with ,-in. travel. The 
computer reads only rows and columns in which buttons are momentarily 
depressed—all single hits are clearly decodable, as are multiple hits in 
the same row or column and some patterns produced by progressively 
adding buttons. The mirror is 16 in. square by % in. thick; it is first- 
surface 75 percent reflecting and 10 percent transmitting. 


ll. PROTOTYPE FOR A TELEPHONE OPERATOR’S CONSOLE 


The setup of Fig. 2 has been used to implement an experimental 
demonstration of a flexible telephone operator’s console—in particular, 
a possible replacement of the present Traffic Service Position System 
(TSPS) station and/or future versions of it.© Figures 3 and 4 illustrate 
many of the features that such a console might have. In the demon- 
stration, button tops are 44-in. green squares, containing green labels; 
lines connecting logical groups of them are blue—sometimes these alone 
appear where an entire set of button tops has vanished in order to pre- 
serve a sense of orientation and geography. Lights at the very top of the 
board, indicating what type of call is presently being processed, are red, 
as are occasional wide frames around buttons, pointing out mandatory 
operator actions. 

The sequence of Figs. 3a through 38e illustrate the handling of a par- 
ticular call, which is being charged to a third phone. Figure 3a shows the 
board before the call arrives, with only a few buttons present, indicating 
the limited number of things an operator can take initiative on with no 
call to process, such as to inquire as to the time of day (lower left). In Fig. 
3b, a “zero +” call has arrived from a non-coin phone (the calling party 
has dialed the called number, but asked for operator assistance by the 
leading zero). The operator asks “May I help you?” and the calling party 
requests that the call be charged to a third phone, whereupon the oper- 
ator prepares to push the special calling (SPL-CLG) button. With the class 
of charge thus declared (Fig. 3c), this button lights brightly and other 
class-of-charge buttons disappear; also, the key pulse special (KP-SPL) 
button lights with a red frame, in effect insisting that the operator enter 
a third phone or credit card number. (Notice that the KP-SPL with red 
frame can be read through the hand.) When the operator pushes this 
button, the keyset appears and is used for entering the third phone 
number (Fig. 3d). The number entered appears in the center of the panel 
and the SPL-NO display button appears, indicating that henceforth there 
is a special number, which may be redisplayed at a later time. The ST- 
TMG button with red frame means that no more information is needed; 
if it is pushed, the telephone machinery may start timing the call as soon 
as the called party answers the phone. When this button is pressed, it 
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Fig. 3—Demonstration of a charge to third phone. (a) Quiescent board. (b,c) Operator 
pushes SPL-CLG button. (d) Third phone number is keyed in. (e) ST-TMG button pushed, 
permitting POS-REL. 


disappears (Fig. 3e) and the POS-REL button appears, permitting the 
operator to release the call from this position, whereupon the board re- 


verts to the quiescent state of Fig. 3a. 
The sequence in Fig. 3 shows that this console is dynamic even during 
the processing of a phone call; all of the buttons that have meaning at 
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any moment, and only those buttons, are visible. The sequence in which 
they appear, in fact, tends to lead the operator through the required 
series of decisions and actions; this should be a significant help in the 
training of new operators (there could be a verbose mode, or a HELP 
button to spell out in words or phrases the meaning of buttons or situa- 
tions encountered.) 

Figure 4 illustrates more features of the TSPS demonstration. For the 
sake of comparison, Fig. 4a shows the complete set of buttons corre- 
sponding approximately with the currently used board. It is not imme- 
diately obvious here that the KP-TBL (meaning, “prepare to key in a 
trouble code’’) is one of the few meaningful actions when no call is 
present. (Figure 4d, on the other hand, is the recommended appearance 
of the quiescent board, with only valid buttons showing; here, the op- 
erator is inquiring as to the time, which is displayed while the TIME 
button is pushed.) Figures 4b and 4e compare the left- and right-handed 
boards; a left-handed operator presumably will want the keyset for en- 
tering phone numbers, etc., on the left. (Programming, as one should 
expect, is done in terms of logical buttons—where each button happens 
to be at any time is a matter of mapping. Individual operators might even 
be allowed to make their personal rearrangements of the board.) Figure 
4c shows the entire board temporarily turned into a typewriter keyboard 
for possible future applications requiring alphanumeric input. Figure 
4f repeats something that already appeared in Fig. 3: a call originates 
from a patient at a hospital, an institution which does not want to be the 
collecting agent for phone calls and therefore requests the phone com- 
pany not to let the call be billed to the calling phone. Thus, in the spot 
where the operator might have expected a class-of-charge button PAID 
(by the calling phone) there is an explanatory note, HOSP, which neither 
appears nor functions like a button. 

The key word is flexibility, including the option of introducing new 
buttons for new services or functions. All such alterations, including 
adding, modifying, rearranging, relabeling, or deleting buttons, are 
changes in software; they would be much easier to implement on any or 
all of the consoles than would be equivalent changes in hardware (once 
the changeover has been done). 

One should finally note that, for the handling of phone calls, a great 
deal of electronic equipment is already needed, including circuitry and 
other means for detecting and decoding button pushes. The significant 
addition suggested by the present prototype is that the main call-han- 
dling mechanism could tell the console what buttons to light and ex- 
tinguish, and where. In other words, the console would show what button 
presses are legal, something which is already implicit in the program. 
Operators would be less likely to do things out of order, simply because 
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Fig. 4—Features of TSPS demonstration. 


they would not expect anything to happen in response to pushing an 
unlighted button. A simple case in point is: if the start timing (ST-TMG) 
button is not present but the class-of-charge panel is entirely illuminated, 
it should be immediately obvious even to the beginner that the class of 


charge still needs to be declared (perhaps among other things) before 
it is legal or possible to start the timing of the call. 
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lil. A RELABELABLE ‘“TOUCH-TONE”’ PAD AS AN INTERACTIVE 
CONSOLE 

A set similar to a Picturephone set could be used as an interactive 
remote computer console, with the Touch-Tone pad as the input key- 
board. A schematic for a mockup is shown in Fig. 5, where a semitrans- 
parent mirror effectively puts the computed image on the Touch-Tone 
buttons so that the 12 buttons can have several labelings and a corre- 
spondingly extended range of functions. A proposed new feature, as il- 
lustrated in Fig. 5b, is the use of the bottom quarter of the screen as a 
light source for illuminating the hand, but only when function buttons 
are displayed, not when some other graphic program result is being 
shown. In the latter instance, when the hand should not be seen, the 
screen “light” goes off. A partial cabinet hides the hand from room 
light. 

Such a console has been built using a computer, a Touch-Tone pad 
_as the keyboard, and a color television RGB monitor so modified that each 
of its three color signals is essentially a Picturephone signal.®° The picture 
is again 3 bits per pixel with a total of 254 lines, 240 pixels per line. The 
front-surface mirror is 45 percent transmitting, 45 percent reflecting, 
with an antireflective magnesium fluoride coating on the second surface. 
Figure 6a shows a distant view of the button set and (inverted) display, 
whereas Fig. 6b shows the user’s view for this same circumstance. These 
buttons are not black, yet the computer-generated image effectively 
obliterates the intrinsic labels on the buttons to the extent that one could 
turn the pad into a normal calculator, high numbers on top, with no 
confusion as to current numbering. 

Two demonstration graphics programs have been written for this 
system. The first, whose three basic button labelings appear in Fig. 7, 
provides for drawing electronic circuit schematic diagrams, such as the 
one shown in Fig. 7d, by the juxtaposition and combination of basic 
patterns. A pattern is selected by pressing a key labeled by a small picture 
of the pattern (see Fig. 7a); the button marked NEXT EIGHT causes 
paging through several such sets of eight patterns. When pressed, the 
pattern is framed, as shown. The user may branch to that part of the 
program which places the new element on the faintly visible current 
picture by pushing PLACE. (Program branches involving relabeling 
buttons are indicated by symbols resembling miniature sets of 12 buttons 
with a label alongside.) Figure 7b shows the result: buttons for moving 
the new instance, for saying ‘‘OK, add it,” and for seeing the result. The 
LET’S SEE button causes the button set and labels to go out, likewise the 
light illuminating the hand, whereas the faint circuit diagram in the 
background comes up to full brilliance, as in Fig. 7d. The speckling of 
unused buttons serves to obscure the intrinsic Touch-Tone labels. An- 
other program branch provides for the redefinition of a pattern (see Fig. 
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Fig. 5—(a) Console setup with a relabelable Touch-Tone pad as an iterative graphics 
console. (b) Side view of console showing extra mirror for reflecting bottom-of-screen light 


source. 

7c). Here, an enlarged pattern appears on the right with a 3 by 3 window 
drawn on it. Contents of this window are displayed on the top nine 
buttons, where a button press flips the cell. The arrow buttons move the 


VIRTUAL PUSHBUTTONS 375 





(b) 


Fig. 6—(a) Distant view of simulated Touch-Tone console setup showing inverted image 
on screen, mirror, and user’s hand operating the buttons. (b) User’s view (through mirror) 
of situation in (a) with Touch-Tone buttons effectively relabeled by the computed dis- 


play. 


window over the pattern; OK returns control to the rest of the program, 
with the pattern redefined. 

A more elaborate program uses the capabilities of the color monitor 
to generate four-color designs like the one shown in Fig. 8 by selecting 
and applying variously stretched and positioned instances of basic 
patterns. Figure 9 shows its seven basic button labelings. The user may 
start with any of the four colors as background (Fig. 9a) and selects a 
pattern as before (Fig. 9b). In addition to placing it anywhere on the 
screen, the user may change its height and width independently (Fig. 
9c). Before finally drawing the addition onto the picture, an optional 
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PICK A PATTERN 





Fig. 7—(a) through (c). Button labelings of a program for composing electronic circuit 
diagrams. (d) Resulting diagram. 


border, of width 0, 1, 2, or 3, and its color, are chosen (Fig. 9g). The colors 
themselves may be redefined (Fig. 9f) by increments or decrements of 
the three primary colors, the + button serving to flip between modes ADD 
and SUBTRACT. Throughout the program, buttons which are temporarily 
meaningless disappear: if border width is zero, the color buttons vanish; 
if no more red can be added in defining the currently selected color, RED 
goes out; if position or size are extreme, the corresponding cursor arrow 
button disappears. 

To summarize the Touch-Tone demonstrations, a complicated in- 
teractive graphics program can be run by means of a 12-button Touch- 
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Fig. 8—Sample result of a more general program for production of four-color designs made 
of variously stretched and positioned geometric patterns. 
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Fig. 9—The seven basic labelings for design-generating graphics system used to produce 
Fig. 8. 


Tone pad if the buttons are easily relabeled to provide a rich variety of 
functions. Button forms and features found useful are: 
Solid square with a 1- to 3-word label alongside. 
Small pictures of symbols significant in the program. 
Other iconic symbols: 
An eye meaning ‘“‘Let’s see the picture.” 
Miniature 12-button set with label: a program goto in- 
volving relabeling. 
Arrows for positioning a cursor and setting its size. 
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Picture cells in a basic pattern being defined. 
Speckles to hide the intrinsic label of a nonfunctioning button. 
A frame marking the current selection. 
Alternation or cycling: 
Paging through sets of patterns. © 
Add vs subtract for defining colors. 
Black vs white cells in pattern being defined. 
Buttons that disappear when not meaningful: 
Cursor arrows when at extreme size or position. 
Color increments when at limit of range. 
Border color when width = 0. 
Button set that disappears for viewing program result. 
Frame around logical groups of buttons: 
Enlarged window of pattern cells. 
Pattern set. 


IV. COMPARISONS WITH OTHER DEVICES 


The existing system closest in form and function to this one is the 
touch panel”® developed as a part of the Plato computer-assisted in- 
struction project,’ where the user’s finger on the screen intercepts one 
vertical and one horizontal light beam, with position decoded to one spot 
out of 16 by 16. Virtual light buttons have the following advantages over 
the touch panel: 

(1) ‘The hand is transparent; it does not obscure buttons pressed or 
buttons below it. 

(tt) The keyboard has tactile feedback through button motion, but it 
does not respond to fingers passively resting on unpressed but- 
tons—both characteristics are very important in typing. 

(tit) Keyboards can easily be made for the simultaneous detection of 
more than one button hit (as in defining a pattern, typing a capital 
letter, or inputting a musical chord). 

(tv) Only one kind of electronic-detection circuitry is required, that for 
detecting contact closings. (Almost every commonly used system 
has a keyboard). 

A light pen, like the touch panel, also is a single-position indicator, 
and it obscures part of the region pointed to, but it does permit drawing 
free-hand curves and precise positioning on the picture cell level. Virtual 
light buttons do not lend themselves well to these tasks; for such oper- 
ations we might use instead an x-y tablet with virtually superimposed 
display. We can, however, use a button array for pointing with much finer 
resolution than button spacing, by any of a variety of protocols: 

(1) A first button hit can position a cursor at the button center, and 

thereafter a small panel of four buttons in the left or right lower 
corner may serve to step the cursor in fine increments. In addition, 
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(11) 


the cursor may slew in any of these four directions if a slew button 
is simultaneously depressed. 

Alternatively, after a single button hit positions the cursor to the 
center of the button, a second nearby button depressed, before the 
first is released, can mean “so many subdivisions in this direction.” 
This method is quickly learned and provides for easy positioning 
on a grid five or seven times finer in both directions than button 
spacing. Subsequent button hits, relative to either the first or second 
button, could mean picture cell displacements in the corresponding 
direction. 


V. ONGOING WORK 


Experimental and developmental work is continuing with hardware 
and with both general and specific software, as follows: 


(7) 


(it) 


The virtual light button setup is being considered as a possible form 
for a TSPS console. It would have the following advantages for op- 
erating companies: it would be expected to reduce training time; 
additions or changes in service or protocol would not require 
hardware changes to the consoles; one design would serve many 
purposes—handling phone calls, maintenance, traffic control, 
clerical work. Implications of the latter are that the operator’s job 
could be restructured considerably, with periods of instruction or 
other jobs easily interleaved with normal call handling in off-peak 
hours. 
A versatile and economical console is being designed, and a mockup 
built, using a 128-button panel, and a 512 by 512 60 pel/inch plasma 
panel!® as the display, as shown schematically in Fig. 10. (The 
plasma panel needs no external refresh system—cells may be lit 
or extinguished individually, and each retains its state until 
changed. The panel is flat, permitting the button set to be flat, and 
since picture cell positions are defined by the structure of the device, 
the overall setup cannot become misaligned by electronic drift.) 
The board is arranged basically as an 11 by 11 array on %4-in. centers 
(normal typewriter spacing) with slight adjustment of the bottom 
rows so they conform closely to a regular typewriter. Some buttons 
hang partially off the 8% by 84-in. display area; their meanings will 
normally be understood. The labels in Fig. 10 indicate how a 
typewriter keyboard is intended to be mapped onto the buttons set; 
the buttons can also be used as an 11 by 11 rectangular array but 
with some distortion in the lower lines, at least for the lowest 12- 
button row. Arrangements are being made to read all combinations 
of simultaneous button presses. 

Commercially available plasma panels, sadly, are monochrome 
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Fig. 10—Console under construction using plasma panel display. (a) Side view. (b) Button 
layout: 128 buttons basically positioned on %4-inch centers. Labels illustrate usage of lower 
board as a typewriter. Large square indicates virtual position of 84-inch square plasma 
panel display area. 


(neon orange). They are, however, much more economical than 
color TV monitors plus refresh buffers. 
The ultimate flexible-but-economical console is expected to be 
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(iv) 


close to the above design of keyboard + plasma panel, with ad- 
dressing and driving electronics of the plasma panel in the lower 
cabinet, not in a bulky frame around the display. It would contain 
a character-generator capable of generating (hierarchically) parts 
of buttons, button tops, and sets of button tops. The remote com- 
puter would then need to designate only which light button(s) to 
put up and/or extinguish, and where, whereas the console would 
report button hits, perhaps with local culling of hits on nonlighted 
buttons. Both are low-capacity channels; a twisted pair phone line 
would suffice. An 11 by 11-in. mirror would be big enough for a 
single user (larger mirrors are useful for demonstrations). 

A general-purpose software package under development will ulti- 
mately facilitate the writing of specific usage programs. It will 
permit convenient design and labeling of light buttons, plus facil- 
ities for conveniently defining mappings between logical and 
physical buttons and describing changes in state: appearance and 
disappearance of buttons, changes in values of variables, and flow 
of control. It will also provide a testing ground for one of the au- 
thor’s basic attitudes about usage of such a system: that there 
should always be exact correspondence between buttons which 
appear and those responded to. This ground rule should aid the 
development of complex systems like TSPS where there is a huge 
number of combinatoric states of the board, and where it is a big 
and difficult job to define precisely and completely which button 
hits are or should be legal from instant to instant. 

One application nearing completion is a text editor, where text 
being worked on appears in the top part of the screen, while the 
bottom part serves as a typewriter keyboard. The novel feature of 
the setup is that pointing (to lines or words or positions for deleting, 
changing, or inserting) is done by pointing into the text. Text is 
displayed with three lines of five characters on each button top, and 
one designates a character by a sequence of two button hits: first 
the button on which the character appears, followed by the same 
button if the character is centered on this button, or by a nearby 
button in the direction that the character is off-center (the second 
button is always one of the 3-high by 5-wide subarray centered on 
the first). 
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100-GHz Measurements on a Multiple-Beam 
Offset Antenna 


By R. A SEMPLAK 
(Manuscript received September 8, 1976) 


Large-capacity satellite communication systems can be developed 
by equipping both satellites and earth stations with multiple-beam 
antennas. Such antennas can be produced from an offset Cassegrain 
with offset collectors as feeds. 


I. INTRODUCTION 


Large-capacity satellite communication systems can be achieved by 
equipping both the satellites and earth stations with multiple-beam 
antennas.! An offset Cassegrainian antenna (Fig. 1) fed by multiple, but 
separate, small corrugated horns has been proposed for such a system.? 
Measurements and theory have indicated that the offset geometry results 





Fig. 1—100-GHz beam-scanning antenna. 
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Fig. 2—Plan view of 100-GHz beam-scanning antenna. (a) Plan view. (b) Side view. 


in an ideal configuration? for both earth-station and satellite antennas 
since the aperture has no blockage; this significantly reduces the side- 
lobe levels and, in turn, reduces adjacent station interference. The 
measurements to be discussed here were obtained with a dual-mode feed 
horn. The current state-of-the-art needs to be pushed to its maximum 
potential before a satisfactory corrugated (hybrid-mode) horn at 100 
GHz can be produced; utilization of a corrugated horn as a feed should 
further improve the performance of the scaled model discussed here. 
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This 100-GHz exploratory study utilizes an antenna that scales to 
about 2 m at about 30 GHz on a satellite (shown in Fig. 1). The antenna 
consists of a 60.96-cm-diameter, numerically machined, parabolic section 
on the right; a confocal 38.1-cm-wide by 10.2-cm-high, numerically 
machined, hyperbolic subreflector on the left; and in the center, an offset 
collector feed that consists of a 5.08-cm, numerically machined, parabolic 
reflector illuminated by a dual-mode horn. The feed is mounted on a 
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Fig. 3—Far-field radiation patterns of the offset collector feed for the principle linear 
polarization. 
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Fig. 4—Azimuth scan of the far-field radiation pattern of the antenna for zero feed dis- 
placement (y = 0, z = 0); this is the reference position 9 = 0°. 


device that provides coarse and fine adjustments in both the y and z axis 
(into the plane of the photograph and along the axis of the main para- 
boloid, respectively). Fine adjustments are provided in the direction of 
the x axis and rotation about this axis is also provided. 

The 60.96-cm-diameter main reflector was obtained by reducing a 
76.2-cm-diameter reflector with absorbing material whose one-way 
transmission loss at 100 GHz is of the order 26 dB. The absorber is po- 
sitioned as shown in Fig. 1 to avoid blockage. Hence, contributions to 
the far-field radiation pattern from the masked area are small. 
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Figures 2a and 2b are the plan view and side view, respectively, of the 
scale model showing various dimensions: From these dimensions, we 
observe that the aperture is large in wavelengths and the f/D (i.e., the 
ratio of prime focal length to diameter) is large, 1.9; hence, we should be 
able to scan tens of beamwidths by lateral displacement of the feed.® 

The radiation characteristics of the feed, shown in Fig. 3, are for the 
principle linear polarizations in the azimuthal plane. We observe from 
this figure that the amplitude is very similar for both polarizations. The 
tick marks on this figure, which denote the edges of the apodized main 
reflector, indicate an illumination taper of the order 18 dB. 
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Fig. 5—Azimuth scan with beam displaced @ = 3.4°. 
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Fig. 6—Azimuth scan with beam displaced 8 = 4.3°. 


ll. FAR-FIELD RADIATION MEASUREMENTS 


As determined by using Bell Laboratories radio range, measurement 
of the incident field presented to the aperture of the beam-scanning 
antenna indicates an amplitude variation of the order 0.5 dB. Also, 
variations in the refractive index of the air along the 480-meter range 
produce scintillations. These scintillations are time-of-day dependent 
and their effects were minimized by measuring in the early morning and 
late afternoon. Overcast days were ideal. The siting of the antenna is such 
that radiation-pattern measurements in the elevation plane are per- 
turbed by ground reflections and therefore are not discussed. 
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The measurements shown in Figs. 4 through 11 are made in the azi- 
muthal plane for vertical polarization. For Fig. 4, the feed is on axis and 
all subsequent beam-scanning measurements are referred to the on-axis 
measurement. Here in Fig. 4, we see a well-behaved far-field pattern with 
side lobes about 27 dB down located two and one-half beamwidths from 
axis. The insert of this figure is a sketch showing the relative location 
of the feed with respect to the subreflector and main reflector. The arrow 
indicates the direction of the transmitting source. 

Figures 5 through 10 show measurements obtained by displacing the 
feed laterally (vy > 0). The feed was adjusted in all its degrees of move- 
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Fig. 7—Azimuth scan with beam displaced 6 = 5.0°. 
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Fig. 8—Azimuth scan with beam displaced 6 = 6.25°. 


ment to optimize the gain for each displacement of the feed, including 
x and z. The angular beam displacement 6 is indicated on each figure. 
We observe the decrease in gain (at zero angle) and also that the side 
lobes continuously decrease with increase in scan angle. 
Measurements were also made by displacing the feed in the opposite 
direction (y < 0) and Fig. 11 is typical; this pattern is almost mirror 
symmetric to that of Fig. 7. One notes the similarity and therefore con- 
cludes that the feed is on the electrical axis of the antenna for the 6 = 0 
pattern. Further examination and comparison of Figs. 4 through 11 
disclose the lack of deep minima for the first nulls of the on-axis position 
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(Fig. 4); the minima are much more pronounced in Figs. 5 and 6. Im- 
provement in performance for the “‘on-axis” position was explored by 
moving the feed in small increments, but the absence of sharp nulls for 
this reference position (Fig. 4) can not be explained at this time. 


lil, BEAM-SCANNING MEASUREMENTS 


Figure 12 is a plot of gain (relative to the on-axis gain) versus scan 
angle. The footed vertical bars indicate the maximum and minimum gain 
measured for the indicated scan angle. The solid dot is the average of 
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Fig. 9—Azimuth scan with beam displaced @ — 7.24°. 
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Fig. 10—Azimuth scan with beam displaced @ — 8.01°. 


measurements within a 0.2 degree interval. The open circles, measure- 
ments made in the opposite scan direction (y < 0), are similar to those 
for y > 0. Scan-angle measurements of less than 3 degrees were not made 
since the change in relative gain is small. The curve is well behaved out 
to a scan angle of 4.5 degrees (of the order of 12 beamwidths). Many 
measurements were made in the region of 4.5 to 6.0 degrees to confirm 
the presence of the shoulder in the curve. Clearly, the antenna can be 
scanned +3.5 degrees (18 beamwidths) with degradation in gain of only 
0.25 dB. A scan of +4.5 degrees (24 beamwidths) degrades the gain only 
1 dB. Thus, the contiguous United States could be served by a syn- 
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chronous satellite with a multiple-beam antenna of this type and suffer 
less than 0.5-dB degradation of gain. 

Figure 18 is a plot of feed displacement in the y and z directions re- 
quired to optimize the gain for a given scan angle of the beam. In a sense, 
this is the plot of the locus of focus—it is not a straight line. It was found 
that no displacement in x was necessary. 

Figure 14 shows plots of the 3, 10, and 20-dB beamwidths as a function 
of scan angle; the beamwidths increase with increasing scan angle. This 
increase in beamwidth limits the number of feeds that can be used on 
this type of antenna. 
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Fig. 11—Azimuth scan with beam displaced in opposite direction 9 = —5.32°. 
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Fig. 12—Relative gain measured as a function of beam scanning 0. 
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Fig. 13—Feed displacement as a function of beam scanning @. 
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Fig. 14—Plot of 3, 10, 20 dB level beamwidths as a function of beam scanning 0. 


IV. CONCLUSIONS | 

From the measurements, we conclude that multiple-beam antennas 
can be produced by using an offset Cassegrain with offset collectors as 
feeds. Further, they are suitable for both earth stations and satellites. 
We can scan a total of 18 beamwidths with only 0.25-dB degradation in 
gain; a scan of this order produces little change in radiation character- 
istics. 
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Detection and Selective Smoothing of 
Transmission Errors in Linear PCM 


By R. STEELE and D. J. GOODMAN 
(Manuscript received May 5, 1976) 


We consider detection of transmission errors in PCM by means of 
statistical hypothesis testing of the received quantized sequence. When 
errors are detected, a median filter is used to smooth waveform 
discontinuities. We describe two error detectors, one (CDC), based on 
correlation measurements, and the other (DDC), based on sample-to- 
sample difference measurements. While both offer s/n advantages over 
conventional PCM in the presence of errors, DDC is more promising both 
in terms of performance and simplicity of implementation. 


I. INTRODUCTION 


An acceptable decoded signal-to-noise ratio (s/n) can be maintained 
in a pulse code modulation (PCM) system in the presence of transmission 
errors if error detecting and correcting codes are added to the trans- 
mitted PCM signal. This approach!” when combined with a Huffman 
coder in juxtaposition to the PCM and channel encoders offers the best 
theoretical solution, given that the properties of the transmission channel 
can be specified. By best solution we mean that for a specified channel 
bandwidth and error rate, the highest decoded s/n can be achieved. 
However, this approach is not usually justified economically, and partial 
solutions may be appropriate. 

One solution is to retain a conventional transmitter and to modify the 
receiver of a PCM system to make provision for the possibility of trans- 
mission errors. Jayant? has observed that when the channel error rate 
is high, a linear or non-linear filter prior to the desampling filter reduces 
the noise due to transmission errors, but only at the expense of a deg- 
radation of speech quality when the channel error rate is low. This ap- 
proach is analogous to reducing the bandwidth of a high-frequency re- 
ceiver or introducing a noise filter in a high-fidelity system. 

In this paper, we discuss a system which, like one of those described 
by Jayant, uses a median filter*° to squelch channel error noise. How- 
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Fig. 1—PCM system with error protection. 


ever, in our system this filter is introduced selectively under the control 
of an error detector, which measures certain statistics of the received PCM 
signal and makes inferences about whether individual samples or short 
blocks of samples have been contaminated by channel errors. Only when 
errors are detected is the median filter introduced. We use the results 
of computer simulations to show that this approach can lead to signifi- 
cant improvements in system signal-to-noise ratio. Another detection 
and correction system has been proposed for differential PCM systems 
operating substantially above the Nyquist rate®. 


ll. ERROR DETECTION AND CORRECTION 


We have investigated the system shown in Fig. 1, using a third-order 
median filter as a corrector. When the detector infers the presence of an 
error in either an individual sample or a block of samples, it causes the 
switch to be in the right-hand position, thus introducing the median 
filter, which, at time m, replaces the sample g,, with the median value 
of the samples Gm+1, Gm, Gm—1. With no error detected, the switch is in 
the left-hand position and G, goes directly to the low-pass filter, which 
transforms the sequence of quantized samples to a continuous waveform. 
Higher-order median filters,° though less satisfactory in the absence of 
an error detector,? may well improve the performance of a selective 
correction scheme. Linear smoothing is also likely to be effective. 

The detector is essentially looking for an unexpected event in {g;}, and 
the more unexpected the event, the greater is the likelihood of detection. 
Correspondingly, large errors are more likely to be observed than small 
ones. Very small errors are unlikely to be detected due to their statistical 
similarity to the sequence {q;} at the transmitter. This characteristic is 
a limitation, although not too serious as it is the large errors that cause 
the greatest degradation of s/n. | 

We have studied two detectors, both of which process blocks of M 
samples and compute a statistic characteristic of each block. They also 
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’ Fig. 2—Correlation detection and correction. 


process shorter blocks (of length SB) imbedded in each long block, 
compute corresponding block statistics, and infer errors in a short block 
whenever its statistic is substantially different from that of the long 
block. In the correlation detection and correction system (CDC), the 
statistic is the first-order autocorrelation coefficient. In the difference 
detection and correction system (DDC), the statistic is the rms value of 
the sample-to-sample difference. In both cases, the long block length 
is M = 64 for speech sampled at 8 kHz. In CDC, the short block has SB 
= 16 samples while in DDC, SB = 2 and the block statistic is simply the 
magnitude of a single sample-to-sample difference. 


lil. TWO ERROR DETECTORS—DEFINITIONS 


3.1 Correlation detection and correction 


Figure 2 is a schematic representation of CDC. Correlation coefficients 
b and R; are computed over blocks of two different sizes. The coefficient 
b,computed over a long block of samples of length M, provides an esti- 
mate of the local correlation of the transmitted sequence. It is compared 
with {R;}, a sequence of correlation coefficients computed over short 
blocks of length SB, which lie within the long block. The presence of one 
or more errors in a short block can result in correlations substantially 
lower than 6 because channel errors are independent of the signal source. 
Thus, if R; is substantially lower than b, the system infers the presence 
of an error in the block of length SB and replaces the samples in that 
block: 


dj; Gj+1 sey Gj+(SB-1) 


with the corresponding outputs of the median filter. 
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Specifically, for the first M received samples, we have 
M-1 


—_ 





DL 4 ditt 
(SS a (1) 


Within this block, the detector computes R1, the correlation coefficient 
of samples Gg; to Gsp; Re is based on G2 to Gsp+1; etc. In general, 


R, = siditi t Gisidjte t+... + dj+sp-24j+SB-1 (2) 
(SB — 1) S;2 | 
where 
1 SB 
Sj a 2 (Gj4i-1)7; J= | ee ,M =O tl: (3) 
SB i=1 


The second long block begins at k = M — SB + 2 and provides a value 
of b to be compared with Ry_sp+2, Ru—sp+3,---,RoamM—sp+1). The third 
long block begins at k = 2M — 2SB + 8 etc. The time windows defining 
short blocks move one sample at a time; these defining long blocks slide 
over M — SB + 1 samples. 

A particular block j of SB samples is deemed to contain errors if R; 
is sufficiently smaller than )b; 1.e., if 


R; < Kb, (4) 


where K < 1 is a design parameter of the CDC system. 


3.2 Difference detection and correction 


This scheme, shown in Figure 3, is based on the notion that the dif- 
ferences between successive samples of a correlated input source tend 
to be relatively small. The detector infers that an unusually large sample 
difference is the result of a transmission error. Over the nth block of M 
samples the detector computes oy, the rms difference between successive 
samples, where 

1 (nt+1)M 


y. i= 0 (5) 


2= 
M nM+1 


Tb 
It then examines the magnitudes, | A; |, of individual sample-to-sample 
differences and if 


| Az = Lop, (6) 


the system replaces g, with the corresponding output of the median 
filter. We use for A; 


Ar = Gr — Qn, (7) 
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where Q), is the actual system output. It is G,-1, if | A, | < Loz; otherwise, 
it is a median filter output. 

Notice that when an error is detected, the DDC system replaces an 
individual sample with a median filter output, while the CDC system uses 
the median filter to modify an entire block of samples. This greater se- 
lectivity of the DDC system results in the modification of fewer correctly 
received samples than with CDC. This property accounts for the fact that 
DDC provides better measured performance than CDC. 


IV. PERFORMANCE EVALUATION 


To gain insight into the detection and correction mechanisms, we 
implemented both detectors on a general-purpose computer and studied 
their operation on PCM samples derived from an artificial, statistically 
stationary source. The initial simulations were efficient computationally 
and demonstrated the effects of design parameters and signal charac- 
teristics on s/n. They also indicated that DDC performs better than CDC 
in the presence of isolated channel errors. These simulations were fol- 
lowed by investigations (using both software and special-purpose 
hardware) of DDC operating on PCM-coded speech transmitted over 
binary symmetric channels. With speech transmission, the error sup- 
pression provided by DDC is clearly audible and the s/n characteristics 
are similar to those observed with the artificial source. 

In the next two sections, we present the results of the first simulations. 
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Fig. 4—CDC performance, 32-level quantizer. 


The source was a Gauss-Markov sample sequence with a correlation of 
0.85, chosen to be representative of speech sampled at 8 kHz. Channel 
errors were introduced by replacing source samples with quantized 
samples of a white Gaussian process independent of the source. 


4.1 CDC, Gauss-Markov source 


With applications to speech communication in mind, we adopted, for 
the length of the long blocks M = 64, an interval (6-10 ms for typical 
sampling rates) over which correlation properties are expected to change 
slowly. The choice of the length, SB, of the short blocks reflects a com- 
promise between the aims of obtaining: (i) reliable correlation measures 
(achieved with SB large), and (ii) accurate error localization (achieved 
with SB small). Unreliable correlation measures result in false alarms— 
spurious error detections—while imprecise error localization leads to 
the modification of a large number of correctly received samples. The 
other detector design parameter is K in eq. (4), which sets the threshold 
of error detection. A low value of K provides a stringent criterion, leading 
to fewer false alarms, but also fewer correctly detected errors than a high 
value of K. 

After studying the influence of SB and K on the false alarm and cor- 
rect detection probabilities, we arrived at SB = 16 as an appropriate 
compromise. Effective values of K range from 0.2 to 0.6, depending on 
the sample error rate (7). 

The dependence of s/n on error rate is shown in Fig. 4 for a 32-level 
quantizer and K = 0.2 and 0.6. Observe that for n > 0.002, the CDC system 
having K = 0.2 is preferable to an unprotected PCM system. For 7 > 0.006, 
the improvement in the received s/n is approximately 3 dB. The cbc 
system offers an improvement of a further 1 to 2 dB when 7 > 0.01, K 


404 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 


UNPROTECTED SYSTEM 


RECEIVED S/N IN DECIBELS 





16 32 64 128 256 
NUMBER OF QUANTIZATION LEVELS 


Fig. 5—DDC performance, error-free channel. 


= (0.6. On the other hand, when the channel quality is reasonably high 
(ny < 0.002), the CDc degrades s/n performance relative to an unprotected 
system. Even with K = 0.2, the deterioration is more than 3 dB in an 
error-free channel, and it is more substantial when the number of | 
quantizer levels exceeds 32. The extra noise arises from the replacement 
of a block of 16 correctly received samples by median filter outputs 
whenever a false alarm occurs. This high cost of false alarms is a principal 
disadvantage of CDC. It does not exist in the DDC system, where an iso- 
lated false alarm introduces only one median filter output. 


4.2 DDC, Gauss-Markov source 


In this system, the detector design parameters are the block size M 
over which the rms difference signal, op», is calculated and L in eq. (6), 
which determines the criterion of error detection. As in the CDC system, 
we used M = 64 to provide a syllabic measure of the rms sample-to- 
sample difference signal, and investigated the effects on s/n of several 
values of L under various transmitter and channel conditions. 

There is an important improvement in the error-free condition com- 
pared to cbc. Figure 5 shows that for the criteria L = 3 and 3.6, the 
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Fig. 6—DDC performance, 32-level quantizer. 


degradation in received s/n due to false alarms is negligible for quantizers 
with 16 to 64 levels, and is only 2 dB for a quantizer with 256 levels. 
The variation of s/n with 7 is displayed in Fig. 6 for a 32-level quantizer 
and L = 3.6. The DDC system is superior to the unprotected linear PCM 
system. With DDC, the s/n decreases by approximately 3 dB compared 
to the 9-dB reduction of the unprotected system when 7 increases from 
~ 0.001 to 0.01. At a 1 percent error-rate DDC provides a 7-dB improvement 
in s/n. With larger quantizers, the dependence of s/n on 7 follows the 
curves in Fig. 6 for n > 0.003. At these error rates the major part of the 
received noise is due to transmission errors rather than quantization. 
Figure 7 shows s/n as a function of input power for a 32-level quantizer 
and 7 = 0.016. At low levels of input power, the quantized samples at the 
transmitter are generally quite small while transmission errors can cause 
very large samples to appear at the receiver. The resulting large sam- 
ple-to-sample differences are reliably detected making DDC particularly 
effective at low input levels, which accounts for the fact that with DDC, 
s/n depends only to a small extent on input power. This property of DDC 
is in strong contrast to unprotected PCM in which the noise due to 
channel errors is essentially independent of signal level and s/n decreases 
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Fig. 7—DDC performance, 32-level quantizer, 7 = 0.016. 


1 dB for each dB decrease in input power, as shown in Fig. 7. Because 
the s/n improvement at low signal levels does not depend on significant 
sample-to-sample correlations, the DDC system can be expected to 
perform well with speech signals which consist of essentially two wave- 
form types: (i) high-level, highly correlated waveforms of voiced sounds, 
and (11) low-level, uncorrelated waveforms of unvoiced sounds. Errors 
in both waveform types are detectable with DDC. 


4.3 DDC, speech inputs 


Encouraged by performance with Gauss-Markov inputs, we studied, 
by means of a computer simulation and a laboratory model, DDC systems 
operating on received PCM samples derived from a speech source. In the 
simulation, quantized samples were coded in a 5-bit sign-magnitude 
format and transmitted over a binary symmetric channel. The s/n per- 
formance, measured over an entire 2-second sentence, is shown in Fig. 
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Fig. 8—DDC performance, speech transmission, 32-level quantizer. 


8. The improvement introduced by DDC is immediately audible in the 
decoded outputs of DDC and the unprotected PCM receiver. 


V. DISCUSSION 


Two detection systems in connection with a smoothing filter corrector 
have been described and shown to give improvement in the s/n of a linear 
PCM codec in the presence of transmission errors. This improvement has 
been achieved without recourse to error detecting and correcting codes. 
The key element of both systems is a detector that allows the smoothing 
filter to be used selectively on the basis of inferences about errors in the 
received sequence. To date we have experimented with only one cor- 
rector, the third-order median filter. Although it increases s/n relative 
to an unprotected system, it is possible that other smoothing schemes 
offer even greater improvements. 

Of the two detectors, DDC has better s/n performance and is easier to 
implement. It does not involve the calculation of correlation coefficients 
and the rms value of the quantized difference signal is easy to measure. 
It also is well-matched to the characteristics of speech waveforms. 

There are existing and anticipated digital transmission systems in 
which performance is limited by channel quality. We hope that the ap- 
proach taken here will be of value in upgrading performance at the ex- 
pense of a tolerably small increase in receiver cost. Our method is also 
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applicable to signal enhancement in systems other than PCM in which 
distortions are characterized by short, severe signal discontinuities. In 
such systems (FM signals exhibiting clicks is an example), the disturbed 
signal can be digitized, selectively smoothed, and desampled in the 
manner described here to produce an improved output. 
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On the Angle Between Two Fourier Subspaces 


By J. E. MAZO 
(Manuscript received September 3, 1976) 


Examining an approximation inspired by equalization theory, we 
consider the minimum angle Qn between the subspaces of Hilbert space 
generated by the sequences {e*?*}N_y and {e'**}),,> ny. Here w © 
[—z,7] and the inner product for the Hilbert space involve a positive, 
bounded weight function r(w). The finite Toeplitz matrices R and T 
generated byr(w) and 1/r(w), respectively, play a crucial role, and, in 
fact, sin2Qy is the reciprocal of the largest eigenvalue of RY. In general, 
sin2Qy is shown to be bounded away from unity as N becomes large. 
The geometry of the problem enables us to give some results concerning 
the product matrix RT, which, out of the present context, may seem 
surprising. 


x 


I. INTRODUCTION AND SUMMARY 


Let H be the Hilbert space of square-integrable functions on [—7z,7z] 
with an inner product? given by 


1 w 
(r= 5- f Pelgr(o)de, (1) 
where the weight function r(w) is bounded and strictly positive; 1.e., 
0<rx<r(w) =R. (2) 


We call a Fourier subspace of H any subspace generated by a finite 
or infinite collection of functions of the form e'”“, n an integer. In par- 
ticular, we shall be interested in the Fourier subspaces [relative to the 
metric r(w)]: . 

Fn — | > fneine| 
—N r 
(3) 


Gu =| > gneine| 
|n|>N r 


+ The subscript r, as in (1), will be used when we wish to emphasize that the weight 
function r(w) is being used. No subscript will refer to an arbitrary inner product, while 
the case r(w) = 27 will be called the “usual” metric. The usual inner product will be written 
without a subscript as well. 
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for N = 0.1 If r(w) is constant, then the subspaces F'y and Gy are or- 
thogonal. We will be concerned with the minimum angle between F'n 
and Gy for a general weight function satisfying (2), and with the limiting 
behavior of this angle as N — o. (The concept of the angle between 
subspaces is not new to the engineering literature. See for example Ref. 
1.) 

The main results of our investigation are stated in terms of two finite 
Toeplitz matrices, R and I, which are generated by the weight functions 
r(w) and g(w) = 1/r(w), respectively [see eqs. (30) and (33) for precise 
definitions]. We also need the Fourier coefficients r, and g,, of r(w) and 
g(w). Then, we show 


1 
largest eigenvalue of RI 


T wT —1 
(ii) lim sin?Qx, x 2 E + es ff. r(w)dw X = ff. yt <1. 
2 = 7 r(w) 


No 1 x 27 


(1) sin2Qy _ 


(it) All eigenvalues of RI are > 1. 
(iv) ¥ |n|ragi<0. 


This is not a trivial inequality in the sense that >> °..r,g, = 1 is. 

The case r(w) = 1 + a cos w is solved exactly in Section V, showing that 
the obvious bound sin2Qy = rpin/Pmax is often loose. A better bound, still 
involving only this ratio, is given in (68). 

In somewhat general terms, this problem arose in the mean-square 
equalization theory of data transmission, where the question is one of 
bounding the effect of replacing tap weight values by certain Fourier 
coefficients. To be specific, let us ignore the effects of noise and note that 
the job of the equalizer is to invert the Nyquist equivalent channel. That 
is, if we had an infinite number of taps at our disposal, we would take the 
transfer function of the equalizer C..(w) to be 


1 0 
C..(w) “Xi 2? ; 


The equalizer transfer function Cy(_) when only the usual (2N + 1) taps 
are available can always be written as 


1 N ; . 
Cn(w) =~ DY be’ - YL enein. 
X(w) oN |n|>N 
In the above expression 6,, |n| < N are “corrections” to the Fourier 
coefficients ¢,, |n| < N. The mean-square error resulting from the 


¥ Any function of the form (3) is in H if and only if the associated coefficient sequence is 
square-summable. 
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equalizer Cy (w) can then be shown to be given by 


1 =z 

— > [X(w)Cn(w) — 1] 2#dw 

2 a 
N ; ; 2 
D dpeho + SY egethel [X(w)|2da. 
—N 


oe. f 
Qn J -x |k|>N 


The minimum mean-square error E2,,j, is the minimum of the above 
expression over the 6;. Now we can imagine taking the (fixed) vector 





e,eihw 
|k|>N 

and decomposing into a vector in the space F'y and one perpendicular 
to it. The part in Fy can be “subtracted off” by the choice of 6’s, leaving 
the remainder. The fraction subtracted off can never be greater than 
cos*Qy, where Qy is the angle between the two subspaces Fy and Gy 
when Xoq(w)? is used as the weight function r(w) for inner products. 
Thus, 


sin?Qq X llew [2s E2in s llen IZ, (4) 


1 T 
7 eae 
lev? =< 


The point of replacing the exact tap values of the finite equalizer by 
Fourier coefficients is not to replace one calculation by another. Rather, 
it is to supplement calculation by insight, since much is known about 
properties of Fourier coefficients. This will be done for a specific equa- 
lization problem involving timing recovery for finite equalizers in a later 
work. 


where 


2 
| Xeq(w) | ?2dw. 








[k[>N 


ll WARMUP EXERCISES 
The angle 6 between two fixed vectors, f and g, is defined by 


_, Re(f.g) 


fll lle 


If f and g are restricted to be in subspaces F and G, respectively, the 
infimum of (5) (call it Q) over all f and g (so restricted) is called the angle 
between the two subspaces. We easily see that 


If -gll? = Ilfll? + lle ll? — 2ilfll Xx llgllcos @, (6) 


and thus by minimizing the right member of (6) with respect to the norm 
of f, we have 


6 = cos 0 € [0,7]. (5) 


ui lf — gl? = sin?aQllg |. (7) 
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In fact, we can also calculate sin2Q via the formula 


sin29 = inf int el” 
| es f¢ lel? 
When (2) holds, the infimum angle between our subspaces F'y and Gn 
[given by (3)] is actually attained and its value is strictly positive. In fact, 
it follows from an application of a theorem by Paley and Wiener? that 
the two sequences (usual metric) 


(8) 


{bn} = vse eine| 


1 i 
Nol | ae | 
form a complete biorthogonal pair, i.e., 
(bn,Wm) = Onm (10) 
and 
h= X (bnh Wn =X Ynsh)on (11) 


for any h in Lo. Thus, either sequence in (9) forms a basis for Lg, or, 
equivalently, {e'"“}, forms a basis for H [with weight function r(w)]. Now 
iff € Fn, g € Gn, 


inf If ~ gill, 
8 


is attained when g is the orthogonal projection of f on Gy, and is a con- 
tinuous function of the finite dimensional f. Therefore, 


of. .Mf-all? 
in | int All | 


is attained, since we may restrict ||f||,, = 1 and thus are minimizing over 
a compact set. The basis property of {e’”“}, in H assures that the mini- 
mum is not zero. 

There are several ways to get at the minimum angle Qy between Fy 
and Gy. We shall begin by using (8) and the calculus of variations. 

However, before we begin to work on this, let us review an old problem 
of linear prediction (really, linear interpolation) theory. We are required 
to find the minimum value of 


B= (" 
20 —7 


over all /2 sequences {a,,,}, under assumption (2) for r(w). We let 


2 
1-— > ane’™’| r(w)dw (12) 
m0 





a(w) = » a,neim 
mx0 
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be any element of L2(—7,7) which has zero for its zeroth Fourier coeffi- 
cient. We then have 





E2= a f. [1 — a(w)|2r(w)dw. (13) 
The calculus of variations yields 
fda*[1 — a(w)|r(w)dw = 0 (14) 
or | 
fe™+[1 —a(w)|r(w)dw =0, n #0. (15) 
Thus, 
[1 — a(w)|r(w) = const = k. (16) 
From (13) and (16), on the one hand, we have 
7 2 2 7 
Bhin=5— [yy twlde= 5 fe (17) 


and, on the other, 
1 7 1 

Ee? =o f. dw(1 — a*){(1 —a)r] = “ f. (1-—a*)dw=k, (18) 
Qa =r Qa eed | 9 


since a* has no m = 0 term. Equating the results of (17) and (18) enables 
us to solve for k [note that k = 0 must be excluded under (2)], yield- 
ing 


rs | (19) 


1 7 dw 

20 s. r (w) 
If r(w) has a zero somewhere in such a manner that Jf1/r = ~, then (19) 
says E2., = 0. This turns out to be the correct conclusion for the infimum, 
but this infimum is never attained. As is well known, the calculus of 
variations can only be applied if the infimum is attained. In fact, in the 


present problem, if we set k = 0 in (16), we would conclude that there 
is an J) sequence {a,,,} such that 


1- + ane’™=0 ae, (20) 
m0 
which obviously cannot be. 


Another way to do this problem is to use the biorthogonal sequences 
(9). We note that all vectors of the type 


~~ andm (21) 


m0 
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form the subspace orthogonal to the vector Wo. Hence, E2;,must simply 
be the squared norm of the projection of ¢o onto Wo. This is (in the usual 
norm) 


| (¢0,%o) |? 
Qang2 = 9 \Po,¥o) |" 
|| @ol|2cos?(¢o,¥o) = II doll Iéol2Z|¥ol? 
ey (i Vr(w) a d 
Oe... 38.100 
™ dw) 1 dw 
Qa ron Mees r(w) Qa S. r(w) 


ll. FINDING THE MINIMUM ANGLE 


We proceed with the calculus-of-variations approach to finding sin?Qx 
via (8). Let 


f(w) = > frei EC Fy 


g(w) € Gy. (23) 


Then, if we vary g* in ||f — g||2, we obtain 
S dg*(w)[f(w) — g(w)|r(w)dw = 0 (24) 


for all allowed variations. Thus, (24) means [f — g]r € Fy, or, in other 
words, 


[f() — g(c)|r(w) = : bpeike = b(w) (25) 


for some numbers by. As in Section II, (25) permits us to write two ex- 
pressions for min||f — g||?. They are 


2 _ i Jb) 5 
min If— al? =>- fe (26) 
and 
= dwf*(w)b(w) = £*-b 2 
min lif — all 5 J. deft(w)(o) (27) 


The vector notation in (27) refers to a row of (2N + 1) numbers. Letting 
b=k(f+b,_),f*-b, = 0, we may equate (26) and (27), solve for k, and 


obtain 
N 2 
(%, ll?) 
min ||f — gl? = Fe a ; 
“ 3 ; P [ere Dryers 
: 7 1|-N —N 
min-— —+ Fe 28 
» On s- r(w) a 428) 


f-y=0 
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Thus, from (8) and (28) 








min 
sin2Qn = | 
>» Jul? =1 
1 
N 48 
' aad ; , : | » (u, +uwz)ee| | 
— f. >, uzne’*"| r(w)dw X min — dw— 
27 J-7 !—-N w TJ-a r(w) 
weu=0 





(29) 
If we introduce the Toeplitz matrix 
1 wT 1 
am = 5 § eiin—me—dy, |n|, |m|<N, (30) 
20 —7 r(w) 
the second term in the denominator of (29) has the form 
utTu + wtTw + 2utTw. (31) 


Expression (31) may be minimized over the appropriate w using a 
Lagrange multiplier, yielding 
1 
utT—lu 
Thus, 


utT-lu 
sin2?Q.y = min-——, 39 
eee (32) 


where FR is the Toeplitz matrix corresponding to r(w), Le., 
1 Rs 
Ram = 5 f eiln—m)e r(w)dw, |n|, |m] <N. (33) 
T = 


Finally, the minimization of (32) yields 


Theorem I. Let the matrices R and I be defined as in (30) and (33). 
Then the minimum angle Qn between the Fourier subspaces Fy and 
Gy, defined in (3), ts 


1 
1-0. = 34 
largest eigenvalue of RT (34) 


This theorem implies that Qy is invariant under the replacement r(w) 
— 1/r(w). 
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Since similarity transformations preserve eigenvalues, we note that 
the eigenvalues of RT are the same as those of VT RVT, which is, by (2), 
(30), and (383), a strictly positive definite Hermitian matrix. 

Before exploring consequences of (34), we shall rederive it from a 
geometric point of view. Let {¢;}f and {¥;}? be complete biorthogonal 
sequences of vectors for a Hilbert space. Let 


V={oit, We {diner (35) 

T = {wilt 
be subspaces generated by the indicated vectors. Note that the orthog- 
onal complement of W, W ,, is given by W, = T. Also note that our 
problem is equivalent to that of finding the minimum angle between V 
and W.Ifvu € V, and ais the angle between v and W (i.e., the angle be- 
tween v and its projection on W), and B is the angle between v and W ,, 
we havet 


ad ae (36) 


Thus, the minimum angle between V and W (call it Q) is the complement 
of the maximum angle between V and T’, called 0,4. Thus, we have 


sin?Q = cos26yy. (37) 


The spaces V and 7’ both have dimension N here. 

Let P represent the orthogonal projection operator onto 7, and Q the 
orthogonal projection operator onto V. It can be shown that if V € V 
is a vector in V which attains the minimum or maximum angle between 
V and T, we must have! 


QPv = dv, (38) 


where A 2 0 is the square of the cosine of the indicated angle. We shall 
see shortly that (34) is a form of (38) when we represent P and Q by 
matrices that are representations of the restrictions of P to V and Q to 
di 

We begin by deriving these matrix representations. For general 
biorthogonal sequences {¢,}, {y;}, let 


Rar = (dn, br) 


n,k = 1,2,---,N. (39) 
Pre = (Wn Wr) 


t If we call the projection of v on W by vj, then v — v1 is the projection of v on W ,. Since 
UV, Vy, and Uv — v, all lie in a plane, (36) follows immediately from a simple diagram depicting 
these three vectors. 

* Let a be any vector in the space such that Qa 0. Then if @ is the angle between Qa € 
V and W, we have cos26 = ||PQa||2/||Qall?2 = (a,QPQa)/(a,Qa). Vectors a which yield 
stationary values of this ratio of quadratic forms can be obtained by differentiating 
(a,QPQa) holding (a,Qa) constant (via a Lagrange multiplier 4). This procedure yields 
(38) upon setting Qa = v. 
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Any vector x can be written uniquely as a vector in V plus a vector in the 
orthogonal complement of V, i.e., 


N co 
x=Vadit DV dy. (40) 
1 N+1 


If we form the inner product of (40) with ¢;,7 = 1,2,---, N, we can cal- 
culate 


N 
ap = PD (R-!)n1(G1,x). (41) 
Thus, given any x, its projection onto V is simply 
N 
2 ai Gi (42) 
with a; given by (41). Similarly, the projection of x onto T is 
N 
2. bi; (43) 
with 
N 
bp = aX (T~*) 1 (¥u,x) (44) 


Hence, if we start with any vector v € V, 
N 
v= > vj di, (45) 
1 


the result of projecting it onto T and then projecting this vector back 
to V is another vector v” € V with components v,, given by 


a. aN 
Um = DY (ROT) midi. (46) 

i=1 
Hence, the operator equation (38) becomes the N X N matrix equa- 
tion 

(T'R)—-!v = dov- (47) 
Equation (34) is thus rederived, after an appropriate relabeling of in- 
dices. 


Since the reciprocals of the largest and smallest eigenvalues of RT have 
interpretations as squares of cosines of angles, we have 


Theorem II. Let matrices R and YT be defined as in (30), (33). Then all 
eigenvalues of RT are 21. 


IV. IMPLICATIONS CONCERNING SIN7Q, 


From (30) and (38), we see that the matrix elements of rl are given 
in terms of the Fourier coefficients r; and g; of r(w) and g(w) = 1/r(w). 
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Thus, (33) reads Rrm = rm—n. We see that, in this notation 


N 
(RT) nk = 2 Ek-ml'm—n; |72|, [R| =. (48) 


This can also be written 





_ 2N+1 
; ne : (w—w’) 
RT), aren iOS , 
(RT) np (np se 4 dwdw’. (49) 


sin 5 (w—w’) 


Equation (49) follows easily from (86), (33), and the identity 





2N+1 
sin 9 y 


ee: sin id 
2 
. If the largest eigenvalue of RT approaches 1 as N becomes large, then 
sin2Q, — 1, and the subspaces F'n and Gy described in (3) eventually 
become orthogonal. Equivalently, we have seen that the question be- 
comes the following: Does the largest angle between the subspaces (with 
the usual metric) 


| r(w)) 
Ry = inw ae 
~ ors |, 
and 
N 


Tartan 


approach zero? If we set N = ~, the two generated spaces are identical? 
(all of L2), so from this point of view it comes as a surprise that the lim- 
iting angle between F'y and Gy is bounded away from zero. 

Let us assume that r(w) has only a finite number of Fourier coeffi- 
ciencs; that is, assume r; = 0 if |j| > k. For this case, the reader may 
verify, using either (48) or (49), that the (2N + 1) X (2N + 1) matrix RT 
has the form (once N = k)! 


Gn = {ein 


TR N is never Gw for finite N unless r(w) = const. This follows from using (11) to show 
that you cannot expand each ¢,, n S |N| in terms of the yz, |k| <N. 

I To see this, let us evaluate (RT) 41 from (48). If [a] <(N —k), the summation in (48) 
may be extended from N = —@ to N = La sincer; = Oifj >k. Using the duality between 
7 ae by a resulting sum is then 1/27 = eC jcepGnele(jerpCb ole dw = dab, since 
£(w) = I/r(w 
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O an (50) 
1 
C 2.4 X y4 xX D 


That is, the first k rows and the last k rows are nonvanishing. The 
remaining 2(N — k) + 1 diagonal elements are exactly unity, while all 
other matrix elements vanish. The four k X k matrices in the corners are 
singled out for special attention and are labeled A, B, C, D. The capital 
X’s in the first and last rows are inserted only to indicate that elements 
in the first and last k rows are not vanishing, in general. 

As an example, we write the elements of A explicitly, labeling the el- 
ements of A by a,;, r,s = 0,1,---, Rk — 1; Le. ao = (RI)_n-n, ete. 
Then 


k 
= fie!!’e —1sw 


1 l=or 
te SS 51 
ee J. f(w) iy oe 
The elements of D can be determined from A using the property 
(RV) na = (RT)~-m,-1. (52) 


It is important to note that (for N = k) the elements of A and D do not 
depend on N. However those of B and C do. For example, the upper right 
corner of B is the element (RT)_y.n, given from (48) as 


k 
(RT )_-nw = Pa riZ2N-I- (53) 


Not only does (53) depend on N, but, by the Riemann-Lebesgue lemma, 
Son-1 > 0 as N increases for / bounded. As N increases, all elements of 
B and C similarly vanish. 

We now look further at the problem of calculating the eigenvalues of 
(50). If one is not an eigenvalue then the matrix RT — AI has 2(n — k) + 
1 diagonal elements (1 — A). By multiplying a row in which such an ele- 
ment occurs by the appropriate constant and adding the result to one 
of the first or last rows, all the elements indicated by “X”’ in (50) can be 
made to vanish. Clearly then, the eigenvalues that are not unity are given 
by the nonunity eigenvalues of the 2k X 2k matrix 


_TA B 
Kx|* ma | (54) 
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where B and C depend on N. All other eigenvalues of (49) are one. Since 
B and C vanish in the limit of large N, we havet 


lim Amax(N) = largest eigenvalue of A. (55) 
N—-oo 
The simple fact that Fy is never Gy for finite N implies that the largest 
eigenvalue of RT, and hence K, is strictly greater than one. Also the fact 
that all eigenvalues of RT are greater than or equal to 1 implies tr K = 
tr A+trD> 2k. But A and D do not depend on N and have the same 
trace. Hence, tr A > k, and A has an eigenvalue strictly greater than 
unity. Thus, from (34) and (55), lim sin?@Qy < 1. 
The above discussion implies the following:! 


Theorem IIT. If r(w) has only a finite number of nonvanishing Fourier 
components and ts not constant, then limn—.. sin?Qn < 1. 


Theorem IV. Let 1 =r(w) > 0 on |—2z,7]| have only a finite number of 
nonvanishing Fourier coefficients, rn. Set g(w) = 1/ r(w) and call its 
Fourier coefficients g,. Then 


ys ln |rnZn < 0, 
n=—o 
unless r(w) is constant. 


Proof. Using (49) for the product RI and the identity immediately fol- 
lowing it, we calculate 





tr RY — (2N+ 1) = (2N +1) i f- rw) Kn (w,w’)dwdw’ — |, 





2a x r(w’) 
(56) 
where Ky (w,w’) is the well-known Fejer kernel? 
_ sin’ SEE 3) 
K 0). 57 
noe!) = 5 ON + 1 4 eo 
sin? — (w — w’) 
2 
It has the following property. Let X(w) € Lo(—7,7), and let 
X(w) = ff. “Klee xia de: (58) 


+ The k Xk matrices A and D have the same eigenvalues. 
t The restrictions in Theorem III and Theorem IV to only a finite number of nonvan- 
ishing components of r(w) is removed in Section V. 
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Call the Fourier coefficients of X(w) and X(w), X, and X;, respectively. 
Then 





X,=X,[1- [n| | jn] = 2N+1 (59) 


IN+1]’ 


= (0 otherwise. 


k 
Thus, if r(w) = » rein, 


f- Ky (w,0")r(w)de = rw’) — tale (60) 


ret 


if only (2N + 1) =k. Substituting (60) into the right member of (56), we 
obtain 





R 1 w einw’ k a 
Yralnl = f° ~ dw’ = > r,[n|én- (61) 
—k Qa —rr(w’) —k 


Noting that we have already established that the left-hand side of (56) 
is strictly positive [r(w) ¥ const], the theorem follows. 


V. EXAMPLE AND FURTHER COMMENTS 
A particular example is provided by choosing 
r(w) =1+acosa, la] <1, (62) 
and, thus, k = 1. We calculate 
1 1 
A=D=5[1+ FS] 
po) Te 
V1—a? E 
_-l+vV1-a? 


a 


B(N) = CW) = a | >0, N>O, (63) 


When N = 0, we have, from (49), 


1 ™ dw 1 
Amax(NV = 0) = — dwx— —- = 
ES ae i. NO GOS oe =n i reo) </Eae 


and otherwise 


Amax(N) = A + B(N). (64) 
From (29) it follows that 
sin2Qy = ee (65) 
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where rmin 2nd Pmax denote the minimum and maximum of r(w), and it 
is interesting to compare numerically (64) with (65). Set a = 0.6, so that 
l'min/’'max = 0.25. Then Qn = 30 degrees from (65). On the other hand 
AmaxUV) = 1.125 + 0.125/9%, N = 0, which means Qy starts at about 63 
degrees and increases to 70 degrees. Equivalently, while (65) allows a 
factor of four between the upper and lower bounds of (4) for this example, 
the more exact evaluation has them differing by only 12 to 25 percent 
depending on N.? 

Exact solutions, as we have just found, may be useful for estimating 
Qn for some particular r(w). If we already know Qn for some other F(w) 
and if it is true that there are constants p,y’ = 0 so that 


F(w) 
1+uyu 


then, in a similar way to which (65) was derived, we can show that 





<r(w) = (1+ w’)F(o), (66) 


sin2Qy 
(l+yw)1 +p’) — 
Equation (67) could be useful when r(w) has a large or infinite number 
of Fourier coefficients. 
Proceeding further along the direction of bounds, we note that, using 
only rmin and rmax, (65) can be considerably sharpened. One can show, 
in fact, letting E = rmin/Tr max, that 


< sin2Qy < (1 +n) (1 + p’)sin2Qy. (67) 





1 1 Ly. | 
iIn’?Qy =|—+-— = 6 
sin’y 2 [5 +7 (E+5) | (68) 
The two basic ingredients are that [see (32) through (34)]| 
_ yTRy 
Amax(V) = soi ptT-ly (69) 
and 
iwit 
A7! 70 
VtAy = <y v (70) 


for any positive definite Hermitian matrix.4 Combining (69) and (70) 
yields 
(WtRY)(tTy) 

v4 
The right ~— of (71) is further = bounded by 


ee Ae | u(w)|2r(w)dw X — ~ S. |u(w)|? ate (72) 


Amax(V) S sup (71) 
v 


+ Another bound involving only the ratio rmin/P max is given in (68). For the present ex- 
ample it yields Qy = 53 degrees, a considerable improvement over (65). 
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where 
= f_ Ju (w)|2dw = 1, (73) 
Qa —T 


u(w) being any L» function, not just one having Fourier components u,, 
restricted to |n| s N. 
To maximize (72), consider maximizing 


Q= (Spiai) (Spi an (74) 


with >°p; = 1, p; 2 0, a; > 0 and distinct. Introducing a Lagrange mul- 
tiplier \ and differentiating, we obtain 


ay le, ree (75) 
aj Qa] 

for all nonvanishing p;. Whatever values the optimum p;’s take, we may 
regard the sums in (75) as fixed numbers, independent of the index 1. 
The resulting quadratic equation in a; can be satisfied by at most two 
values of a; and, hence, only two p; are nonvanishing, and are easily seen 
to correspond to the maximum and minimum aq; if we are to maximize 
Q. Also the two p; have equal values. Thus, (for a maximum Q) 








Qmax =] (Amax + din) ( ee ) (76) 


max Qmin 


and (68) follows. 

Our next theorem says something about the limiting behavior of 
Amax(JV), and in fact bounds the latter away from unity in the general 
case. 


Theorem V. If r(w) ¥ const. 
1 1 ™ 1 x dw 
lim Amax(NV) = — ie | d sam rete 77 
tin Amax(V) | a ar Mra Ge 


This immediately removes the restriction in Theorem III. Also, the 
left side of (56) is now, in general, bounded away from zero, and, by 
simple limiting procedures, Theorem IV also follows without restricting 
r(w) as was previously required. Of course, “—©” is included in the 
statement “<0.” 


Proof. We begin with a modified form of (69) (let y = I'd) which 

states 

oT RT 
olde 


Thus, any particular choice for ¢ provides a lower bound. We choose for 


Amax(V) = max (78) 
¢ 
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the components ¢; of ¢, |k| = N, ob, = 6-n». Inserting this choice into 
(78) yields 


(TRI)_n — 1 N 
Amax(N) = <A EN TN aa 2 §n+NE—N—-—mlm—-n 
'_n.-—N £0 nm=—-N 
e, 2N 1 
2. 8 £81" s— =5-| > 881 s— +80 |. (79) 
“Bos t=0 280 Ls,t=—2N 


Since? lim > £.21ls—t = 80, the theorem follows. 
-=N 
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+ Namely, 8ers—: becomes, in the limit, the sth component of 1/r(w) X r(w) which is, of 
course, 6,0. 
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Rate vs Fidelity for the Binary Source 


By S. P. LLOYD 
(Manuscript received May 5, 1976) 


Errors are deliberately introduced in the output of a binary message 
source to reduce the entropy rate. The errors depend on the source 
sequence in a deterministic shift-invariant manner. The tradeoff be- 
tween error rate permitted and reduction of entropy rate ts of interest. 
It ts shown that the ideal bound cannot be attained. If the errors are 
required to be produced causally, then a bound stronger than the ideal 
bound takes over. The quantities of interest are found explicitly for the 
example: change all 0’s in 0-runs of length 1 to 1’s. 


If a transmission channel has capacity C bits/second and a message 
source has entropy rate H bits/second satisfying H < C, then the source 
can be encoded, fed to the channel, decoded at the channel output, and 
recovered essentially without error after such handling. The rate-dis- 
tortion theory is concerned with the case where H > C; we try to mini- 
mize some measure of the errors that are necessarily present.! 

We treat here a special class of systems in which the errors are delib- 
erately introduced before submission to the channel to reduce the en- 
tropy rate to that of the channel; the mutilated source is then handled 
without further error by the channel. The usual treatment involves use 
of block codes, but we will examine the more interesting sliding (or 
shift-invariant) codes. | 

The source in Fig. 1 emits letters x,, -© <n < ~, at rate 1 per unit 
time. The letters are drawn from alphabet A = {0,1} with probability 
distribution P{x, = 0} = P{x, = 1} = 1/2, the same for all n, and the draws 
are statistically independent. We denote by x = (x,: -w <n <o)a 
sample sequence of the source process X. The entropy rate of the source 
is H(X) = 1 bit per unit time. 

The error generator operates on a source sequence x to produce a se- 
quence e = (e,: —~ <n < &) of A valued random variables e,, = e;, (x). 
The error at time n is a deterministic function e, = 
n(+**,Xn—13XniXn+1 °°+) of the whole sample sequence x. The measurable 
function 7 is the same for all n, so that the dependence of e on x is shift 
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SOURCE 






ERROR 
GENERATOR 


Fig. 1—Reducing the entropy rate by introducing errors. 


invariant: if sequence x is shifted m places, the sequence e = e(x) shifts 
m places with it. | 

The output of the adder box “®” in Fig. 1 is simply y, = x, ® en, with 
® the usual addition mod 2. We regard the output process Y as X cor- 
rupted by the errors E. Now, depending on how E£ is generated, process 
Y can have entropy rate H(Y) < H(X), and so can be handled by a 
channel of correspondingly smaller capacity at the price of the errors 
introduced. We are concerned with the tradeoff between the error rate 
and the decrease in entropy rate. Explicitly, suppose error rate « is 
specified, 0 < « = 1/2, and that n(- - - ;; -- -) is astationary error-generating 
function with the property Pie, = 1} = «. The resulting Y process will 
have a certain entropy rate H(Y) = H(X) determined by n. What is the 
least value that H(Y) can have for all such n? 


I. THE IDEAL BOUND 


Let us consider the joint process Z = (Y,E), where the Z alphabet is 
{(0,0),(0,1),(1,0),(1,1)} and each z,, in a sequence z = (2,: —-~7 <n <&) 
is the pair Z, = (yn,€,). The mapping 0: X — Z, which sends a sample 
sequence x to sequence z = Ox, is obviously shift invariant. The map 0 
is also measure preserving by definition; the probability measure on the 
space of sequences z is that induced by © and the X distribution. In the 
other direction, x, = yn, ® €n, —© <n < © is the inverse map ®: Z > X, 
which recovers the source sequence x if the compressed version y and 
the errors e are known. This map is also shift invariant and measure 
preserving. Since processes X and Z are isomorphic in the above sense, 
their entropy rates are the same: H(Z) = H(X) = 1. 

From the general theory (Section 6 in Ref. 3), the entropy rate H(Z) 
is the average conditional entropy 


H(Z) = H(z1|- «= ,2Z0) 
= H[(yz,e1)|- a ,(y0,€o)] 


of letter z;, given the preceding letters - - - 29. Using the addition law for 
conditional entropy, we find 
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H(Z) = H[e,|-++,(yo,e0)] + A[y1|- ++, (vo,e0) and e1] 
< H(e;) + H(y1|--+,yo) 
=h(e) + H(Y), 


since H(e,) = h(e) = € loge(1/e) + (1 — €) loge[1/(1 — &)] when Pie; = 1} 
= ¢, Pile; = 0} = 1 — c. Using H(Z) = 1, we have the lower bound H(Y) = 
1 — h(e), 0 s € S 1/2, for any such compression scheme. 

Our first result is: 


Theorem 1: For error rate 0 < « < 1/2 it is not possible to attain the 
bound H(Y) = 1 — hfe). 


Proof: For each fixed N 2 1, there holds NH(Z) = H(21, +++ ,zn|-++,20); 
by induction from 


H(z1,°°: ZN{° -* 20) = H(z\|- -* 20) + H(zo,--> 2N|° ** 21). 
Arguing as before, we find 
N= A[(y1,e1), oes (yn,en)|° ss ,(y0,€0)] 
= H[y1,+++.yn|-++,(v0,e0)] 
+ Hei, teas ,en|- a ,(yo,€0) and Vizeee YN]; 
moreover, 


A[y1,-++,yn|-++,(voeo)] S H(yn -++ yn le + + Yo) 
= NH(Y) 


and 
H(e,,-++,en|-++,e0 and--+,yn) Ss NH(e,) = Nh(e). 


Now, equality in this last step holds iff e1, -++,en,f(-++,e9 and +++ ,yn) 
are mutually independent, f is any measurable function of the variables 
indicated. (Equality in the first step requires that y,, «-+,vx be condi- 
tionally independent of ---,eo given --+,yo, but we will not need 


this.) 
For real valued variable u, let us define 
(x) u if a = 0, 
ul = 
l-—uifa=l1; 


we put also u()) = [uy] @ = [yu] for all a,8eA; note that uO = 
yD) =U, y(0)() = uy 61)(0) = ]1—u.From 


Xj = Yj @ e; 
= yjej + (1 — yj) — @)) 


— ,,(0),,(0)(0 1) ,,Q)(0 
= ye 4. ye) 
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and 
1-—xj=y; ® (e; @ 1) 
= yj(1 — ej) + (1 -— ye; 


1),()(1) 
= gen + y's e\ ) ’ 


it is apparent that 
xe) = > yee, 


where a8 are variables in the set A. Multiplying these equations together 
for 1 <j) = N gives 


x (ov) ane x (iN) = » gO » y Pv) nb ae yENe (or) (81) ngage NBN) 
1 N 


for each of the 2% choices for aj, +++ ,an. 

If H(Y) = 1 — h(e), then ej, ---,en,[y{8) --- y6] are mutually in- 
dependent for each choice of the 6’s. Since Efe} = «), 1 <j = N, we 
find 


1 
ons Eley) +e ™)} 


= eee Y elev... cen) (Bn) Ffy 0... y8} all a’s. 
Bi BN 


Using now the assumption « ~ 1/2, let c be the number c = —e/(1 — 26), 
so that c@) = (1 — €)/(1 — 2¢). From 


Y cCOHM = 1, YT cM la(6) = O85 
om 


a 
we obtain 


1 1 
— = or (ar(y1) 62. claNnv(YN) y% — 
ec c x oN 


< = 
2 foal CN 


= eee eee YH clvrVela(B) . .. Elan) lan) (BN) 
Q] aN PI BN 


x Efyy? «++ yi} 
= Elyy? +++"), all y’s. 
_ If this holds for all N = 1, then the {y,: —° <n < } are independ- 
ent identically distributed random variables with distribution Ply, = 


0} = Ply, = 1} = 1/2. The entropy rate of this process is H(Y) = 1 + 
1-—h(e). 0 
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ll. THE CAUSAL BOUND 


We now consider the case where each e,, depends only on the present 
and past values of the x’s. That is, e, = n(+ ++ ,X,-1;Xn), —-~ <n < ©, for 
n a measurable function of the variables indicated. The relation between 
Z and X is thus bicausal: z,, depends only on --- ,x,, and x,, depends only 
on --+,z,. It follows that conditionals given ---,z, agree w.p.1 with 
conditionals given «++ ,xp. 


Theorem 2: If the dependence of the error process E on X ts causal, then 
ACY) =1-—2¢,0 << 1/2. 


Proof: Setting A variant form of the basic inequality is H(Y) 2 1 —- 
H(eo|- ++ ,x-1), obtained from 


1 = H(Z) = Al(yo,e0)|- ++, (y-1.e-1)] 
= H[eo|- ++ (y-1,e-1)] + H[yo]--+,(y-1,e-1) and eo] 
= H(eo|- aie y=) 1 H(yo|- iia Vai): 


we have used only that y, @ e, is less informative than (yn,e,), 
—o <n s —1. The assumption that 7 is causal is not involved. 

Let us partition the space of sample sequences x into the four disjoint 
subsets: 


Ay = {x: n(-++ ,x-130;01,°-+) = O and nf: ++ ,x_131jx1, +++) = 0} 
Ag = {x: n(- ++ ,x-1305x1,-++) = Land n(-++ ,x-315x1, +++) = 1 
Ag = {x: n(- ++ x -1303x1,°°-) = Oand n(- ++ ,x-a51jx1,---) = 
Ag = {x: n(- ++ ,x-1303x1, +++) = Land n(- ++ ,x-151;x1, +++) = O}. 


The random variable x(x) is defined as the part namer for this partition; 
1e., K(x) = Juff x € Aj, 1 Sj = 4. Since x(x) depends only on coordinates 
-+*+,x_, and x1,--- of x, the conditional distribution of x9 given x is 
P{xo = O|«} = Pf = 1]x} = 1/2 w.p.1. The resulting random conditional 
entropies of €o,yo are seen to be 


h(eo|-++,x-1 and x1, +++) = h(eo|x) 


_ (0 for« = 1,2 

7 - for x = 3,4 
h(yol-+ + ,x-1 and x4, +++) = h(yo|x) 

_ {lfor« =1,2 

7 - for k = 3,4. 


Putting a; = P{A;},1 <i < 4, the average conditional entropies are 
then , 
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H(eo|---,x-1 and x1, +++) = H(eo|x) =a3+ a4 
H(yol: --,X%-,and xX1,-°+-) = H (yolk) = a, + ao. 


The error rate is 
] 
e= Pley= I} = 5 (as + aa) + ao, 


so we have 
H(eo|-++,x~-1 and xy, --+-) = 2e — 2ao < 2€. 


Assume now that E depends causally on X; then, eo is conditionally 
independent of xj, - -- given: -- ,x_1, implying H(eo|-++ ,x—1 and xj, ++) 
= H(eo|---,x-1). Combining the inequalities, we obtain H(Y) = 1 — 2e. 
O This bound is strictly above the ideal bound when 0 < « < 1/2, since 
h(e) > 2€ on this interval. 


Hil. EXAMPLE 

The following example is mentioned in Ref. 4, but a solution is not 
given. Let the errors be e€, = n(Xn—13XniXn4+1), —©% <n < ©, with 7 the 
function 


n(1;0;1) = 1 


n(x —1;X03x1) = O if x-1x9x1 # 101. 


The error rate is Pie, = 1} = 1/8. We will compute H(Y) and H(£) ex- 
plicitly and compare H(Y) with the bounds of Sections I and II. 

A graphical representation of 7 is given in Fig. 2. The vertices of the 
directed graph are the state pairs x_ 1x0, the arrows represent the tran- 
sitions from x_1X9 to X9X, and the value n(x —1;x9;x1) is shown on the 
arrow from x-1%9 to xox1. The corresponding graph of yo = 
Xo ® n(x~13;X0;x1) appears in Fig. 3. 

We now compute H(Y). Examination of Fig. 3 reveals that process 
Y is a renewal process, with renewal at the beginning of each run, either 


0 


0 
00 10 





01 11 
- 


Fig. 2—e9 = n(x-1;x0;x1). Values of eo as function of the transition. 
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0 
00 10 


1 1 


Fig. 83—yo = Xo ® n(x~-1;X0;x1). Values of yo as function of the transition. 


of 0’s or of 1’s. Moreover, the length R of a 0-run has the geometric 
distribution 





P{IR© = j} = J = 2,3,--- 


Qi-V 
The mean and entropy of this distribution are easily found to be E{R™} 
= 3,H(R) = 2. 

The 1-runs of Y involve the subgraph shown in Fig. 4, relabelled for 
convenience. A 1-run results from a path (driven by x) which starts at 
A and follows lettered arrows until exit occurs at B along the dotted 
arrow. If the length R™ of the run has value R“ = /, the driving x’s have 
probability 2-Y* per path, so 


‘ Vj ‘ 
PRO = he ep J = 1,2,---, 


where v; is the number of paths of length j from A to B along lettered 
arrows. 
For J = 4, we classify the paths of length j from A to B according to 
the earliest appearance of arrow a: 
(1) One path c(d);~2e which does not contain a. 


(tt) Paths which start ba --«. 
(111) For each 0 = k sjJ — 4, paths which start c(d),ea---. 


In (iz) the continuations “‘: --” are just the paths from A to B of length 


— a a oe oe ee ee 


Fig. 4—Subgraph for 1-runs of process Y. 
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j — 2, one each, and in (iit) the lengths of the continuations are J — 
(k + 3) for each O = k =) — 4. In consequence, 


jn-4 ; 
vp = 1+ vj_ot+ a Vi-(k+3), J ZA. 


The initial terms for the recursion are v; = 1, vg = 1,v3 = 2, clearly, and 
it is convenient to define vp = 0. Then from 


pp= lL tvj- got vj3t+++-+ 9, J 22, 
vj-y = 1+ vj_3t+-++++ v9, )= 3, 


it is apparent that 
Vj = Vj-1 a3 Vj—92, J = 3. 


That is, v1,v2, +++ is the Fibonacci sequence 1,1,2,3,5,8, «++. 

The generating function for the Fibonacci sequence is 2fv;x/ = 
x/(1 — x — x), as is well known, so the distribution of R has generating 
function 


2 RY == 55 |e) Se V6 =, 
is 


ee. ee 
4—2x —x 
Taking (d/dx),=1, we obtain E{R“} = 5 for the mean length of 1-runs 
in the Y process. For numerical evaluation of the entropy H(R")) of the 
R distribution, we obtain the r; = P{R“™ = j}, 7 = 1, from the recur- 
sion 


1 ; 
i a A | ee ees J 23; 


2 
1 
ry=-, fro= 
1 y 2 
The numerical result is 


H(R®) = 3° 7; logs— 
j=l 


rj 
= 3.593946 bits per run. 


Starting at the beginning of a run, suppose y is truncated after M runs 
of both kinds have occurred. The total number of coordinates y,, is the 
sum >![RO + R] of M independent samples each of R®,R™. The total 
random entropy is the corresponding sum >}"[h(R) + h(R‘)] for the 


samples. Omitting the detailed arguments, we obtain from the strong 
law of large numbers 
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5° [A(R®) + h(RY)] 
H(Y) = lim ————————-. wp. 


ee SRO + RY] 
1 


_ H(R®) + H(R) 
E{RO} + {RO} 

= (0.699243 bit per letter. 

As a check, note 
(1) 
Poo= l= soy 4 aR) 
which is clear from Fig. 3. The entropy of yo is H(yo) = h(8/8) = 0.954434, 
and the difference 
h(3/8) — H(Y) = H(yo) — H(yol- ++ .y-1) 
= I(yo,'+ + + .y~1)) 


= 0.255191 bit per letter 
is the amount by which Y fails to be a Bernoulli process. The ideal pound 
is 


H(Y) 21 —-h(1/8) 
= 0.456436 bit per letter. 
The bound of Section II is easily worked out to be 
H(Y) 21 — H(eo|- + ,x-1) 
= 1— (1/2)h(1/4) 
= 0.594361 bit per letter. 


The entropy rate H(E) of the errors can also be obtained from run- 
length considerations. Indeed, {e,, = 1} is just the event {x,, = 0 is a O-run 
of length 1} in process X. The 0-run lengths S® and the 1-run lengths 
S in process X each have the geometric distribution 


; : ae 
P{S© = j} = P{S® = j} = of J= 12. cee, 


as is well known. Let the run lengths after an occurrence of {S) = 1} be 
SY), S$, SY, SP, -.-, and let random variable J be the smallest vp 2 1 
for which S® = 1. Since P{S = 1} = P{S® > 1} = 1/2, we again have 
(1/2, 1/2) Bernoulli trials, i.e., 


I 
Pi = > op J =1,2,+-- 
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The number of intervening x,,’s is the 0-run length V = S{) + S{? + 


»-- + $0), + SP in the E process. The generating function for each S“ 
is 


x 
9—-%x 


and the generating function for S conditional on S® > 1 is 





y x/PIS® = J} = 











é 2 
¥ IPS = j[$ > 1} = ; 

5 o 

so we have 

= o ] x \iy x2 \J-! 
vane EGGS) 

ut | | 2 oi 22 NS 

x(2 — <x) 


ear eS er |x | = 1< 1.13968. 


Taking (d/dx),=; gives E{V} = 7, and since the 1-runs in E have length 
V = 1 w.p.1, we have the check 


EVO} 
Plen = N= Foy + BVO) 8 


For numerical evaluation of H(V)), we use the recurrence 


1 1 
UE, = UE-1 — — Up-o FH —UE-3, RZS; 
k k-1 ~ 7 Uk-2 g Uk-8 


1 


, 03> 
5 16 


0O [ee 


i: 
1 4’ 2 
satisfied by v, = P{V© = k},k 21. The numerical result is 


= 1 
H(V) = 3 vp loge — 
1 Vk 
= 4.061168 bits per run, 
giving 
H(V) + H(V) _ 


1 
E{VO)} + E{VM} ~ 8 
= 0.507646 bit per letter 


H(i) = H(V)) 


as the entropy rate of the errors. The entropy of eo being H (€9) = h(1/8) 
= 0.543564 bit per letter, the difference 
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h(e) — H(E) = H(eo) — H(eo|- + ,e-1) 
= I(€0,}+ + + ,e-1}) 
= 0.035918 bit per letter 


is the amount by which E fails to be Bernoulli. 
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Pitch-Adaptive DPCM Coding of Speech With 
Two-Bit Quantization and Fixed Spectrum 
_ Prediction 


By N. S. JAYANT 


(Manuscript received June 9, 1976) 


This paper is concerned with the utilization of speech waveform 
periodicities in differential pulse code modulation (DPCM) coding with 
2-bit adaptive quantization and time-invariant spectrum prediction. 
Our work is based on computer simulations of DPCM codes. We have 
studied pitch detectors based on autocorrelation and an average 
magnitude difference function (AMDF), and we have measured the 
benefits of predicting from a previous pitch period as functions of 
pitch-period-updating frequency and periodicity-indicating thresholds 
(for autocorrelation and the AMDF). We have compared several alter- 
native methods of utilizing past quantized samples (in the present and 
previous pitch periods) for providing speech sample predictions. We 
find the following combination to be attractive for waveform coding at 
bit rates in the neighborhood of 16 kb/s: 2-bit adaptive quantization 
with a one-word (2-bit DPCM word) memory, pitch detection performed 
on unquantized speech (preferably with an AMDF criterion) and a 
prediction scheme that uses fixed three-tap (short-term) prediction 
for nonperiodic waveform segments, but switches to an appropriate 
one-tap (long-term) predictor upon the detection of strong periodicity. 
With four sample utterances, the latter procedure results in an average 
SNR (signal-to-noise ratio) gain of 3.75 dB over a non-pitch-adaptive 
encoder. 


I. INTRODUCTION 


An important subclass of speech waveform encoders is characterized 
by the use of adaptive quantization and predictive (DPCM) encoding.! 
Time-invariant spectrum predictors are simple to implement and robust 
in the context of coarse quantization. The benefits of adaptive prediction 
are, however, well recognized and documented,” and the greatest 
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achievements in bit-rate reduction have in fact depended on the use of 
adaptive short-term (spectrum) prediction as well as adaptive long-term 
(pitch) prediction, as seen in the paper by Atal and Schroeder.* 

This paper is concerned with the relatively less documented combi- 
nation of adaptive pitch prediction and nonadaptive spectrum pre- 
diction. The study of this kind of prediction is motivated by the obser- 
vation that speech waveforms abound in highly periodic segments and 
by the conjecture that the use of this periodicity may provide a prediction 
potential that is substantial enough to obviate the need for adaptive 
short-term (spectrum) prediction. The attraction in this approach will 
evidently depend on the complexity of pitch detection itself. The pitch 
detectors used in this paper are based on autocorrelation and AMDF 
(average magnitude difference function) and are quite simple to im- 
plement; they are indeed much simpler than the mean-squared-error- 
minimizing pitch detector described in Ref. 4. Moreover, as discussed 
in Section IV, the success of pitch-adaptive DPCM does not depend 
critically on accurate pitch detection in the sense in which the term is 
used in formal speech research.° 

A thesis by Trottier® considers the possibility of simplifying the 
Atal-Schroeder encoder.4 Among other things, this thesis discusses 
simple pitch-detection algorithms, the criticality of a well-designed 
adaptive quantizer, and the inefficiency of approaches seeking to simplify 
adaptive spectrum prediction through the use of very few predictor taps, 
say two. An unpublished work of Grizmala’ provides one of the first 
proposals for a simple pitch-adaptive DPCM that entirely avoids adaptive 
spectrum prediction. Grizmala discusses AMDF-based pitch detection 
and fixed three-tap spectrum prediction for nonperiodic waveform 
segments. More recently, Xydeas and Steele report an instance of a 6-dB 
SNR gain for a fixed-spectrum DPCM encoder arising from the utilization 
of waveform periodicities.’ Finally the detection of periodicity based 
on autocorrelation and AMDF is documented in speech papers®?:!9 as 
well as in coding literature.!! 

One of the contributions of the present paper is the demonstration 
that fixed-spectrum pitch-adaptive DPCM is useful in the context of a 
specific type of adaptive quantizer that has received considerable at- 
tention in recent coding work.!*1° This paper also shows that AMDF- 
based pitch detection is slightly more effective than an autocorrela- 
tion-based procedure. The paper also demonstrates that, during periodic 
waveform segments, a simple one-tap predictor across the pitch period 
is more efficient than several multitap predictors involving many past 
samples in the present and previous pitch periods. Finally, the paper 
includes formal measurements of pitch prediction gain as a function of 
(¢) pitch-period-update frequency, and of (ii) thresholds that the AMDF 
and correlation functions should exceed for a waveform segment to be 


440 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 


judged as periodic. Our results are all based on computer simulations 
of DPCM encoders. 

The results of this paper are expected to be relevant to speech wave- 
form coding at bit rates in the order of 16 kb/s. At this bit rate, the use 
of fixed spectrum prediction and adaptive quantization results typically 
in a quantization noise level that is quite easily perceived, while the so- 
phistication of adaptive spectrum prediction is often unwarranted, be- 
cause undesirable quantizer-predictor interactions begin showing up 
at around 16 kb/s in practical waveform coder designs.!415 Adaptive 
pitch prediction, on the other hand, appears to be a useful and robust 
sophistication at 16 kb/s. With this bit rate in mind, this paper will deal 
exclusively with two-bit quantizers for the DPCM coding of Nyquist- 
sampled (8-kHz) telephone-quality (200-3200 Hz) speech. Our numerical 
results refer to two female utterances, “The chairman cast three votes” 
and “‘The boy was mute about his task,” and two male utterances “A 
lathe is a big tool,” and “The boy was mute about his task.” These ut- 
terances will henceforth be labeled F1, F2, M1, and M2. 

The organization of the paper is as follows. Section II reeommends 
a slowly adaptive quantizer with a one-word memory, and Section III 
proposes a three-tap spectrum predictor. Section IV discusses pitch 
detection by means of AMDF- and autocorrelation-type procedures, and 
points out how pitch analysis can be performed either on quantized 
speech or on the original unquantized speech. Section V compares dif- 
ferent prediction algorithms for periodic segments, including the im- 
portant example of an appropriate one-tap predictor. Section VI mea- 
sures the gains of pitch-adaptive DPCM as a function of (1) the pitch- 
detection procedure, (11) AMDF and autocorrelation thresholds used in 
hypothesizing periodicity, (111) pitch-period-updating time, and (iv) 
prediction algorithms used for periodic waveform segments. Section VII 
summarizes performance figures for the four sample sentences and 
discusses results in the context of 16-kb/s waveform-coding. 


ll. TWO-BIT ADAPTIVE QUANTIZER 


Figure 1 shows a uniform four-level quantizer used for pitch-adaptive 
DPCM coding. The step-size A is adaptive. The adaptations are based 
on a one-word memory.!2!8 Specifically, the step-size is modified at every 
sampling instant by a multiplier that depends only on whether the 
magnitude of the previous quantizer output was 0.5A, or 1.5A,. Re- 
spective step-size multipliers make A,+; = /-A, or E2:A,. In the context 
of quantizing prediction errors across a pitch period, we have found that 
the most useful adaptations were ‘slow’ adaptations of the form:!? 


E, = 0.95; E> = 1.10. (1) 


As discussed at length in Ref. 12, values of optimal step-size multipliers 
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OUTPUT 





Fig. 1—A 2-bit adaptive quantizer. 


reflect the nature of the input signal spectrum, and the stationarity of 
the input variance. The step-size adaptations were subject to maximum 
and minimum values that were appropriate for the given peak speech 
amplitude of +1024: 


AMAX = 192, AMIN = 1.5. (2) 


Finally, nonuniform quantizers were not found to be very effective 
in pitch-adaptive DPCM using adaptive quantization. This had to do with 
the effect of DPCM predictions on the probability density function (PDF) 
at the quantizer input. The observation that nonuniform quantization 
is not very beneficial reflects the fact that predictions in DPCM cause a 
quantizer-input PDF that is more gaussian than the PDF of the original 
speech amplitudes. The latter, for example, can be modelled by a 
gamma-PDF for which nonuniform quantization is very useful.? 


ll. TIME-INVARIANT SPECTRUM PREDICTION 
A T-tap spectrum predictor is represented by 


T 
Xp = Yi as+ XQr-s, (3) 
s=1 
where X and XQ refer to input and quantized speech samples. 
In time-invariant (fixed) prediction, the coefficients a are matched 


to the long-term spectrum of speech via the corresponding autocorre- 
lation function, as described in Ref. 1. 
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Using a typical long-term spectrum characterization,’ the following 
designs have been used for fixed one-tap and three-tap spectrum pre- 
dictors: 


a, = 0.85 for T'= 1 (4) 
and 
a,;=1.10; ao=—0.28; az = —0.08 for T = 3. (5) 


These predictor coefficients are rounded values resulting from a spec- 
trum model where the speech autocorrelations are 0.825, 0.562, and 0.308 
for delays of one, two, and three 8-kHz samples, respectively. These 
autocorrelations are reported in Ref. 16 as the result of a study on a very 
large speech-sample base, and constitute slight revisions of very similar 
autocorrelations reported in Ref. 17. 

In coding our speech waveforms, the three-tap predictor provided a 
typical SNR gain of nearly 1 dB over the one-tap predictor. Spectrum 
predictions in this paper will henceforth refer to a time-invariant 
three-tap design, as in eq. (5). 


[V. MEASUREMENT OF PITCH PERIOD 


This section defines the AMDF- and autocorrelation-based pitch 
measurements used in our work, discusses the use of unquantized speech 
samples X or quantized samples XQ for the pitch analysis, and provides 
illustrations of pitch measurements. In general, pitch analysis will be 
based on a window W containing W contiguous speech samples Z (Z = 
X or XQ). The sampling instant when a pitch period is measured is de- 
noted by r, so that a current speech sample will be Z, (X, or XQ,, as 
appropriate). The pitch period is denoted by P, and P is assumed to have 
minimum and maximum values Pyyrn and Pmax, respectively. G; and 
G» are thresholds that can be used’ to hypothesize waveform periodicity 
with varying degrees of confidence. V is the pitch period updating time 
(see Section VI). 


4.1 AMDF-based pitch measurement 


Consider the average magnitude difference function 


AMDF(p) = AVERAGE|Z, — Zu—p]; 


p = Pwin,Pin + 1,---,PMax, (6) 
where the averaging is over all pairs (u,u — p) such that both Z, and Z,—p 
are in W. 


The AMDF pitch detector estimates the pitch period P to be 
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P= pEst 
if AMDF(pDEst) < AMDF(p) (7) 
for all p in the range (Pyin, Pmax) with the exception of Dest, and if 
AMDF(pgst) < Gy + AVERAGE(|Z,,|), for (8) 


all u in W. 

The value of G, is discussed in detail in Section VI. Typically, G; = 
0.5. With Nyquist-sampled (8-kHz) speech and for a single pitch-analysis 
procedure that should cover the expected range of p in both male and 
female speech, the following numbers seem appropriate:° 


Pwin = 16, Pax = 160, W = 256. (9) 


Notice that Py excludes the obvious minimum AMDF (0) at p = 0, and 
that the window length W is well in excess of the maximum anticipated 
pitch period Pmax. It turns out that this requirement (W > Pyax) is 
quite important for efficient pitch prediction and waveform coding. The 
range of the pitch-period search (16 < p < 160) is wide enough to cause 
frequent problems with multiple peaks in the AMDF function, and 
multiples of the fundamental pitch period are often picked up as P. 
Fortunately, however, this kind of error in pitch tracking appears to be 
quite harmless as far as pitch-adaptive waveform codes are concerned: 
the need is for a sequence of waveform samples {XQ} that provide good 
predictions of a current sequence {X} in periodic segments, and it seems 
to be immaterial whether {X} and {XQ} are one pitch period apart or n 
(>1) pitch periods apart. 


4.2 Autocorrelation-based pitch measurement 
Consider the autocorrelation function 
C(p) = AVERAGE(sgn Z,, - sgn Z,,—p); 
p = Pyin,Pmin + 1, +++ ,PMax; (10) 


where the averaging is over all pairs (u,u—p) such that both Z,, and Z,-p 
are in W and, furthermore, both |Z,,| and |Z,,-p| exceed an appropriate 
speech-clipping level 


Zcup = 0.64 MAX(|Z|Max, [Z| Max); (11) 


where |Z|\ax is the maximum speech magnitude in the first one-third 
part of W and |Z|%:ax is the maximum speech magnitude in the third 
one-third part of W. 

The autocorrelation-pitch detector estimates the pitch period P to 
be 


P = pest (12) 
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if 
C(prsr) > C(p) 


for all p in the range of (Pyin,PMax) With the exception of pgs, and 
if 

C(prsr) > Ge. (13) 
The role of Gg is discussed at length in Section VI. Typically Go = 0.2. 
Appropriate values of Pin, Pmax, and W follow (9). The nonzero value 
of Pin excludes the obvious maximum C(0) at p = 0. 

The center-clipping operation described by (11) is quite effective in 
mitigating spurious peaks in the C(p) function, such as peaks repre- 
senting a low first-formant frequency. Typically, autocorrelation pitch 
detectors work with speech that is low-pass filtered to, say, 900 Hz,° but 
such filtering was not used in our waveform coding program. 

The pitch-measurement techniques based on (6) and (10)—especially 
the autocorrelation method (10)—are easier to implement than the 
mean-squared-error-minimizing pitch detector described in Ref. 4, which 
is based on computing the autocorrelation of Z [this involves computing 
products of real numbers, instead of taking differences as in (6) or using 
one-bit numbers as in (10)]. The efficacies of AMDF- and autocorrela- 
tion-based pitch detectors have recently been calibrated in terms of the 
performance of several other pitch-tracking procedures.° 


4.3 Pitch analyses based on X and XQ 


Figure 2a demonstrates pitch analysis based on original, unquantized 
speech samples X. We see how the analysis window can be aligned so as 
to extend equally on either side of the current sample X, to be encoded 


Xr+1—w/2 Xr Xr+Ww/2 
(a) 
——— 
TIME 
rf —-—-—-——-——-— W------- sf 
XQr—w XQr-1 Xr 


(b) 
Fig. 2—Pitch analysis based on (a) unquantized speech X and (b) quantized speech 
Q. 
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Table | — Local and global minima/maxima in pitch-period search 
(G; = 0.84, G. = 0.2; speech sample: M2, analysis based on 
unquantized speech) 


* Minimization of Maximization 

p* Normalized AMDF of Autocorrelation 
29 — 0.33 

30 0.66 0.34 

31 — 0.35 

34 0.62 . : a 

37 — 0.38 

38 — 0.39 

95 a 0.45 

96 0.40 0.46 


* Pitch-period estimate = 96 samples 


(quantized); such alignment turns out to be quite critical for realizing 
the maximum potential of pitch-adaptive waveform codes. | 

Figure 2b shows the analysis of pitch based purely on past quantized 
samples XQ,-; (s > 0). Figures 2a and 2b apply equally to AMDF or 
autocorrelation analysis. 


4.4 Illustrative measurements of pitch 


Table I demonstrates examples of AMDF- and autocorrelation-based 
searches for the pitch period P. Entries in the table represent those local 
minima/maxima in the AMDF/C functions, which were below/above all 
previous local minima/maxima in the search for P (16 < p < 160). Also, 
only those minima/maxima that cross the G,/G»2 thresholds, eqs. (8) and 
(13), are listed. For both the AMDF and C functions, a global peak ap- 
pears at the pitch period P = 96. 

Table II provides a typical time plot of P (number of 8-kHz samples) 
for four different pitch-tracking techniques. The analysis refers to a 
sample segment from the utterance F1. Notice the remarkable closeness 
of X-based contours in columns 1 and 3. Notice also that with XQ-based 
analyses, the AMDF function tends to preserve pitch information much 
better than the autocorrelation measurement. 


V. PREDICTION ALGORITHMS FOR PERIODIC WAVEFORMS 


Figure 3 sketches a periodic waveform segment. P is the ‘pitch period’, 
X, is a current waveform sample to be encoded, and XQ denotes an al- 
ready quantized sample in the present ‘pitch period’ or in an earlier ‘very 
similar segment’ of the periodic waveform. 

Our prediction algorithms for periodic waveforms are linear, and they 
are of the general form 


as 3 
xX, = Xu Qyu-XQr-ut+ Y apiy- XQ;—p-p. (14) 
u= v=0 
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Table || — Pitch-period contours from four pitch-tracking 
techniques (speech sample: F1). Entries along columns are 
successive values of P (number of 8-kHz samples) 


Autocorrelation Autocorrelation 

AMDF of X AMDF of XQ of X of X 
2 2 2 19 
39 39 2 19 
78 39 78 19 
39 39 39 19 
39 39 39 39 
39 39 39 38 
39 39 39 39 
39 2 39 4] 
43 2 43 44 
40 2 40 oo 
Al 42 Al 25 
132 132 132 2 
134 134 134 2 
135 134 135 2 
57 135 57 2 
78 80 78 48 
157 157 157 50 
35 35 2 19 
2 2 2 19 

2 2 2 19 

2 2 2 2 

2 Z 2 18 

2 2 2 2 
eA Z 2 18 
2 2 2 2 

2 2 Z 2 
yi) 2 2 2 

2 2 2 2 

2 2 2 31 
35 2 35 34 
35 2 35 34 
35 35 35 35 
36 35 36 36 
36 35 36 36 
36 36 36 36 
| 36 37 oT 
37 37 37 Yi 
oa 37 37 37 
37 37 af ot 
37 37 oF ol 
Yi of af oO” 
3o7 37 OT 37 
75 37 15 a1 
75 75 ot af 
37 75 37 37 


We have considered many special cases of the general algorithm (14); 
Table III summarizes three interesting examples. 

The seven-tap predictor attempts a clever combination of spectrum 
prediction [see (5) in Section II]| and pitch prediction. This approach 
was proposed by Grizmala,’ who in turn was simplifying a formal pro- 
cedure of Atal and Schroeder.* The three-tap predictor is the simplest 
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Fig. 3—Prediction algorithms for periodic waveforms. 


nontrivial combination of the two types of prediction. It is suggested by 
a simple geometrical procedure of completing an idealized parallelogram 
with vertices at the topmost four dots in Fig. 3. Finally, the one-tap 
predictor is the simplest approach to pitch-adaptive coding and is sug- 
gested by the very strong correlations that are observed between X, and 
X,—p in highly periodic waveform segments. 


VI. DESIGN AND PERFORMANCE OF PITCH-ADAPTIVE DPCM CODER 


Figure 4 provides a block diagram of the pitch-adaptive DPCM coder. 
It is different from conventional DPCM! in the inclusion of a special 
predictor for encoding the periodic segments of the input waveform. The 
spectrum predictor is formally defined by (5) and the pitch predictor 
by (14). The switching between the two predictors is controlled by the 
crossings of appropriate thresholds G; and G2 (Section IV) by the AMDF 
or autocorrelation functions, respectively. The test for periodicity is done 
once every V samples. If the waveform is decided to be “periodic” as a 
result of the test, the pitch period P (coming out of the AMDF or auto- 
correlation measurement) is used in the predictive encoding of a current 
block of V samples. (Both the binary “periodic/nonperiodic” decision 
and the pitch period, if any, are updated for the next block of V sam- 
ples.) 


6.1 SNR, SNRV, and SNRSEG 


The design and utility of pitch-adaptive coders will be discussed using 
the following signal-to-noise ratio as a performance criterion 


SNR(dB) = 10 logi0| » x2/5 (%- x@,)], (15) 


r=] r=1 
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Table Ill — Three prediction algorithms for periodic waveforms 


Name of 

Predictor ay ao a3 ap Qp4+1  ap+e ap+3 
AVERAGER 0.5 0 0 0.5 0 0 0 

“7-'Tap” 1.1 —0.28 —-0.08 1 —1.1 0.28 0.08 

3-Tap” 1 0 0 1 —1 0 0 

“1-Tap” 0 0 0 l 0 0 0 


where N is the total number of input samples. 

In deference to the fact that the pitch-adaptive coding is performed 
in blocks of V samples, we consider an additional measure of perfor- 
mance for the Sth block | 


SNRV(S)(dB) = 


V-S V-S 
logo S  xP/ Sx - xa]. a6 
r=V(S—1)41 r=V(S-1)+1 
The average value of SNRV over the total input signal duration (over 
N/V input blocks) will be called the ‘segment-signial-to-noise ratio’ 
SNRSKEG (Ref. 18) 


SNRSEG = ——"Y" SNRV(S) (17) 
~ N/V x 


SNRV is an obvious indicator of local encoding quality; its average 
value SNRSKEG reflects aspects of quantizer performance that do not 


RECEIVER 





FIXED THREE—TAP 
O SPECTRUM Q 
PREDICTOR 








PREDICTOR FOR 
PERIODIC 
WAVEFORMS 






SWITCH CONTROLLED BY 
PERIODICITY INDICATION 
IN AMDF OR 
AUTOCORRELATION 
COMPUTER 


Fig. 4—Block diagram of pitch-adaptive coder. 
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Table IV — Comparison of prediction algorithms (utterance: F 1; 
number of blocks: 134; block length V: 64; pitch-detector: based 
on unquantized speech and AMDF; G; = 0.71) 


Predictor Averager 7-Tap 3-Tap 1-Tap 
SNR(dB) 10.3 13.1 13.3 14.4 
SNRSEG (dB) 15.0 16.5 16.8 16.8 


always come out from the conventional SNR measure.!® For example, 
the time variation of SNRV would provide an appropriate indication 
of the differential treatment of voiced and unvoiced waveform segments 
(this is seen in Fig. 5); also, occasional large samples of SNRV (associated 
with pitch-adaptive coding of highly periodic segments) would have a 
better chance of showing up in the final result if the performance mea- 
sure is SNRSKG, rather than the conventional SNR. 


6.2 Comparison of the prediction algorithms of Table Ill 


Table IV compares the performances of the four predictors in Table 
III for the DPCM encoding of a typical position of utterance F1. It is very 
interesting that the simplest of these predictors, the one-tap predictor, 
provides the best encoding. In fact, the rest of this paper will uniformly 
assume an appropriate one-tap predictor for periodic segments. 


6.3 Choice of decision thresholds G1 and G2 


Table V illustrates AMDF-based coding as a function of the periodic- 
ity-decision threshold G [see (8)]. A choice of G; = 0.84 appears to 
provide the best combination of SNR and SNRSKG. This value of G, 
corresponds to a 1.5-dB prediction gain [ratio of average magnitude of 
input X to average magnitude of prediction error e (see Fig. 4)|. The 
value of G, = 0.71 (corresponding to a 3-dB prediction gain) provides 
a performance that is very close to the maximum. In fact, Grizmala’ 
recommends the latter value of G, = 0.71. 

Table VI shows corresponding results for autocorrelation-based DPCM 
with G2 as parameter. One notes a broad optimum, with G2 = 0.2 rep- 


Table V — Effect of G; on AMDF-based pitch-adaptive DPCM (all 
parameters are the same as for Table | except that G; is nowa 


variable) 
G1 0 0.50 0.71 0.84 1.0 
SNR(dB) 9.3 14.2 14.2 14.4 14.5 
SNRSEG(dB) 12.5 15.2 16.6 16.8 14.9 
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Table Vi — Effect of Go on autocorrelation-based pitch-adaptive 
DPCM (all parameters are the same as for Table | except that the 
pitch detection is now correlation-based) 


Go 0.1 0.2 0.3 0.4 0.6 
SNR(dB) 13.6 13.8 13.6 13.3 10.3 
SNRSEG(dB) 14.6 14.5 14.3 15.8 14.3 


resenting a reasonable autocorrelation threshold for hypothesizing pe- 
riodicity; it is interesting that an SNRSEG criterion would dictate G2 
= 0.4. 


6.4 Comparison of pitch detectors: AMDF vs autocorrelation; X-analysis vs 
XQ-analysis 


Table VII compares, for optimal settings of G; and Go, the encoding 
performances of AMDF- and autocorrelation-based pitch measurements. 
Notice the slight superiority of the AMDF approach, especially from an 
SNRSEG point of view. Notice also that pitch analyses based on X (Fig. 
2a) are distinctly superior to those based on quantized speech XQ (Fig. 
2b). Finally, it is very significant that, in the case of XQ-based analyses, 
the value of SNRSEG is 3- to 5-dB higher than that of SNR. This indi- 
cates that even with XQ@-based designs, many periodic segments get 
encoded very well in a short-term sense (leading frequently to very good 
SNRV values that tend to boost the average SNRV-value SNRSEG). 
The above observation has been confirmed in informal listening tests. 
These tests have also shown that the quantization noise in XQ- based 
AMDF-coding tends to be “‘whiter” than the noise obtaining with the 
other three pitch-detection schemes of Table VII. 


6.5 Pitch-period update-time V | 


Table VIII shows coder performance as a function of how frequently 
the periodicity test is made, and a possible pitch period recomputed. 


Table VII — Comparison of four pitch detectors (all parameters 
are the same as for Table I, except that four pitch detectors are 
involved, and G,; and Gs are optimized for each case) 


Autocorre- 
Type of Pitch Analysis AMDF lation 
Basis of the analysis xX XQ xX XQ 
SNR-optimizing G-values (G, for AMDF, G2 for 0.84 0.84 0.20 0.30 
correlation) 
SNR(dB) 14.4 10.0 13.8 10.1 


SNRSEG(dB) 16.8 15.0 14.5 13.2 
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Table Vill — Dependence of performance on update time V; 
entries are SNR values in dB (female utterance: F 1; number of 
blocks: 134; male utterance: M1; number of blocks: 134; pitch 
detector: based on unquantized speech and AMDF; G, = 0.71) 


V 32 64 128 192 
Male = 12.1 11.4 9.8 
Female 15.1 14.4 12.8 a 


Recall that the update time assumed in Tables IV through VII was V 
= 64 samples (8 ms). Previous researchers*~’ have usually recommended 
V-values like 40 or 50. 


Vil. SUMMARY AND CONCLUSIONS 


Table [IX compares, for the complete utterances F1, F2, M1, and M2, 
the performance of pitch-adaptive DPCM coding with that of DPCM with 
a fixed three-tap spectrum predictor. Note that both of these coders use 
adaptive quantization. The conventional encoder uses a fixed spectrum 
predictor while the pitch-adaptive encoder includes a second adaptive 
one-tap predictor, which is switched in whenever an AMDF analysis on 
X suggests sufficient periodicity (G; = 0.84). 

We note that there exists across the four sample sentences an average 
3.8-dB SNR gain with pitch-adaptive coding. The better performance 
with female speech is not surprising, since for a given duration of a voiced 
speech utterance, the high- se female utterances have a greater 
number of pitch periods. 

Figure 5 provides a typical time-plot of pitch period P and local sig- 
nal-to-noise-ratio SNRV in pitch-adaptive coding. The example refers 
to asegment from F2. A pitch-period of zero in Fig. 5 indicates absence 
of periodicity. Notice the low values of SNRV for these nonperiodic 
blocks. Also, notice the cluster of three values of P ~ 133. These three 
estimates are obviously three times a true pitch period ~ 44. 

As mentioned earlier, the work in this paper was motivated by the 
desire to improve waveform encoder performance at bit rates in the order 


Table IX — Summary of DPCM encoder performance 


Median Pitch Number of 


(Number of Speech DPCM With no Pitch-Adaptive 
Sample 8-kHz Blocks Pitch Tracking DPCM 
Utterance Samples) (V = 64) SNR(dB) SNR(dB) 
Fl 36 240 10.0 15.0 
F2 40 288 14.0 18.0 
M1 90 | 192 11.0 13.5 
M2 92 245 11.0 14.5 
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SNRV IN DECIBELS 
P|ITCH—PERIOD ESTIMATE P IN 
NUMBER OF SAMPLES 





BLOCK NUMBER 
(BLOCK LENGTH = 64 SAMPLES) 


Fig. 5—Typical time variations of pitch period and local signal-to-noise ratio SNRV. 
(Data refers to a segment from utterance F2) 


of 16 kb/s. The 2-bit pitch-adaptive coders discussed need 16 kb/s to 
transmit prediction-error information; and if pitch-analysis is to be 
performed on uncoded speech, the transmission of this information to 
a receiver will entail an additional channel capacity of about 1 kb/s. This 
assumes that pitch-period samples are coded with 7-bit accuracy and 
updated (and transmitted once, say, every 56 samples (8 kHz X 7 bits/56 
= 1 kb/s). Alternatively, the coder can be used on a 16-kb/s channel if 
the sampling rate can be restricted to 15 kb/s/2 bits = 7.5 kHz. 
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Recently, a statistical-decision approach to the problem of voiced- 
unvoiced-silence detection of speech was proposed by Atal and Rabiner. 
This method was found to perform well on high-quality speech. How- 
ever, the five speech parameters used in the analysis were not found 
to be as good for telephone-quality speech. Thus, an investigation was 
undertaken to determine a suitable set of parameters that would pro- 
vide a reliable voiced-unvoiced-silence decision across a variety of 
standard telephone connections. A large number of parameters (70) 
were included in the investigation, including 12 LPC coefficients, 12 
correlation coefficients, 12 parcor coefficients, 12 LPC partial error 
terms, etc. Many of the parameters were immediately eliminated be- 
cause they provided almost no separability between the three decision 
classes. The remaining parameters were used in a knockout optimi- 
zation to determine the five best parameters to use for a voiced-un- 
voiced-silence analysis. Various error weights were investigated to see 
what types of errors occurred and how they could be minimized. Finally, 
the use of the Itakura two-pole spectral normalization was investigated 
to see its effect on the error scores. 


Il. INTRODUCTION 


In a recent paper, Atal and Rabiner described a fairly sophisticated 
method for reliably classifying segments of a waveform as voiced speech, 
unvoiced speech, or silence.! The analysis method used a statistical 
pattern-recognition approach to make this three-class decision. In an- 
other investigation, Rabiner et al. showed that the accuracy of the 
classification algorithm was quite high when the input signal was 
wideband; however, for telephone speech inputs, the accuracy of the 
classification degraded quite significantly.2 The reason for this result 
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was not that the method inherently broke down for telephone inputs, 
but instead that the particular parameter set effective for wideband 
inputs was not equally effective for band-limited inputs. Thus, the 
motivation for the work to be presented in this paper is to investigate 
the suitability of a large number of parameters as features for reliable 
voiced-unvoiced-silence classification for telephone-quality speech. 

Figure 1 shows a block diagram of the basic voiced-unvoiced-silence 
analysis algorithm. As shown in this figure, there are three steps in the 
method. First the speech is preprocessed. Generally, this preprocessing 
is asimple filtering operation; e.g., in the earlier work, a 200-Hz highpass 
filter was used to remove dc, hum, or low-frequency noise components 
present in the input signal. For telephone line inputs, we have considered 
somewhat more sophisticated preprocessing; namely, we have studied 
the use of a second-order inverse filter (as originally proposed by Ita- 
kura®) to normalize out the effects of varying telephone lines. 

The second step in the algorithm is the feature measurement stage. 
For wideband inputs, only five parameters were considered, namely: 


(1) Energy of the signal 

(iz) Zero-crossing rate of the signal 
(iit) Autocorrelation coefficient at unit sample delay 
(iv) First predictor coefficient 

(v) Energy of the prediction error. 


These measurements were shown to provide a high degree of separability 
between the three classes of signal for wideband inputs.! However, for 
telephone-quality inputs, the band-limiting of the telephone line con- 
siderably reduces the effectiveness of all of the parameters in separating 
the classes of voiced speech, unvoiced speech, and silence. For example, 
the absence of signal energy above about 3 kHz significantly reduces the 
number of zero crossings for unvoiced speech. 

To find an effective set of parameters that would be capable of reliably 
distinguishing between the three signal classes for telephone line inputs, 
a large number of parameters (70 in total) were studied. Using a set of 
training data, the probability-density functions for each of the param- 
eters were estimated. Those parameters that provided little or no sep- 
aration between voiced speech, unvoiced speech, and silence were 
eliminated from consideration. The remaining 36 parameters were 
studied as to their effectiveness in classifying telephone line inputs. A 
knockout type optimization was used to obtain the five most effective 
parameters for classifying signals according to an error-weighting 
scheme. Several combinations of different test sets of data and error 
weights were investigated. 

The final step in the analysis method of Fig. 1 is a distance computa- 
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FEATURE DISTANCE 
SPEECH PREPROCESSING MEASUREMENT COMPUTATION 


Fig. 1—Block diagram of silence-unvoiced-voiced classification system. 


tion to determine whether a test signal is voiced, unvoiced, or silence. 
For this step, the non-Euclidean distance metric of Ref. 1 was retained 
because of its invariance properties to linear transformations of the 
data.4 

Before. presenting the results of the investigation, it is worthwhile 
reviewing the major distortions of telephone line signals as compared 
to wideband signals recorded with a high-quality microphone. These 
distortions include: 


(1) Band limitation—The frequency response of a telephone line is 
approximately band limited between 300 Hz and 3000 Hz. 

(ii) Phase distortion—For the frequency band between 300 and 3000 
Hz, the magnitude of the incoming signal remains relatively flat; 
however, the phase is altered significantly in this band. 

(111) Nonlinear effects—Various nonlinearities occur in telephone 
transmission, including amplitude distortion (signal fading), peak 
and center clipping, impulse and/or gaussian noise addition, 
crosstalk, etc. 


The effects of the first type of distortion are the most significant as far 
as this analysis method is concerned.* However, the other types of dis- 
tortion can, and often do, play a role in determining an effective set of 
parameters for classifying telephone line signals. 

The organization of this paper is as follows. In Section II, we present 
a description of the techniques used to determine the most effective sets 
of five parameters for classifying the incoming signals. In Section III, 
we present the results of the knockout optimization tests for each of the 
test sets of data and for.each set of error weights. Finally, in Section IV, 
we compare the results on telephone inputs to those obtained with 
wideband inputs. A typical example showing how the method ultimately 
performed is presented to illustrate the types of problems that occur with 
telephone inputs. 


* In this work we are considering only those distortions that occur within a local PBX; 
thus, one would expect a minimum of phase distortion and other nonlinear effects to occur. 
The place in which such distortions can become significant is in long-distance transmis- 
sions. 
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‘Il. TELEPHONE SIGNAL ANALYSIS SYSTEM 


For the preprocessing step of the analysis, a single highpass filter was 
always used to eliminate hum, dc offset, and low-frequency noise. This 
filter is described in Ref. 1. A second type of preprocessing was also in- 
vestigated: the spectrum normalization technique as originally proposed 
by Itakura.? In this technique, the gross long-time spectrum of the signal 
is estimated using a two-pole LPC model, and then the signal is inverse 
filtered to remove the gross spectral tilt. Using the two-pole spectral 
normalization to reduce the spectral variability should, theoretically, 
also make the feature estimates more reliable. The rationale for con- 
sidering this form of preprocessing is that for telephone speech the in- 
dividual telephone transducer and line responses vary greatly across 
different handsets and telephone lines. Thus, any features estimated 
over such varying conditions may be adversely affected by the inherent 
variability of the transmission medium. 

The way in which the two-pole spectral normalization was imple- 
mented is shown in Fig. 2. For each frame (10 ms of data), three corre- 
lation coefficients, R(m), m = 0,1,2, are computed using the relation 

N-m 
Rj(m) = & sj(n)sj(n+m) m= 0,1,2, (1) 
n=0 
where N is 100, the sampling frequency is 10 kHz, and ) is a frame 
counter that goes from 1 to NF, the number of frames in the utterance. 
The weighted normalized averages of the first two correlation coefficients 
(the m = 1, m = 2 terms) are computed as 


—— 1 NF R-(m) 
R(m) == + + W,(m), = 1) 2 

(m) NG x R,(0) j(m), m (2) 
where W;(m) is a weight on the correlation function of the form 
1 if R;(0)>T 


Me 0 otherwise 


(3) 


i.e., only frames whose energy [R;(0)] exceeds a fixed threshold T are 
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Fig. 2—Block diagram of two-pole spectral-normalization system. 


458 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1977 


included in the computation of the average correlations. The factor NC 
in eq (2) is the number of frames that exceeds the threshold of eq. (3). 
The weighting is used in computing the average correlations to eliminate 
unvoiced sounds and silence for which the correlation values are sig- 
nificantly different from those for voiced frames. 

The third step in the normalization procedure is to compute the pre- 
dictor coefficients of a two-pole linear predictive coding (LPC) match 
to the long-time average gross spectrum. If we denote the two LPC 
coefficients as a; and ag, then the inverse filter needed to normalize the 
speech spectrum has a transfer function 


A(z) = 1 —ayz7! — aoz~?. (4) 


On a frame-by-frame basis the inverse filter can be applied directly to 
the autocorrelation coefficients of a high-order LPC analysis of the signal 
by convolving them with the autocorrelation coefficients of the sec- 
ond-order inverse filter.? 


2.1 Features used in the analysis 


The parameters (features) studied in the course of this work included 
the following: 


Param- 
eter Description 
1-12 The LPC coefficients of a 12th-order analysis using the Burg lattice method with 
a 10-ms frameé®*®: a(1) to a(12). 
13-24 The first 12 autocorrelation coefficients of the signal using a 10-ms frame: ¢(0,1) 
to #(0,12). 


25-36 The first 12 parcor (partial correlation) coefficients of the signal: k(1) to 
k(12). 

37-48 The first 12 partial normalized error coefficients of the LPC analysis: E(1) to 
E(12). 


49-60 The first 12 cepstral coefficients of the signal as obtained by transforming the 

LPC coefficients: c(1) to c(12). 

61 The log energy of the signal: LE. 

62 The number of zero crossings per 10-ms frame: NZ. 

63 The log normalized error of the 12-pole LPC analysis: LNE. 

64 The maximum value minus the minimum value of the signal during the frame: 
ML. 

65 The absolute energy in the first difference of the signal: ED. 

66 The number of zero crossings per 10-ms frame for the first difference signal: 
NZD. 

67 ‘The maximum value minus the minimum value for the first difference signal: 
MLD. 

68 The absolute energy of a smoothed version of the signal: ES. 

69 The number of zero crossings per 10-ms frame for the smoothed signal: NZS. 

70 The maximum value minus the minimum value for the smoothed signal: MLS. 


Figure 3 shows the basic measurement scheme. For each 10-ms frame 
of the signal, an LPC analysis was performed using the Burg lattice 
method giving a set of 12 LPC coefficients, 12 parcor coefficients, and 
12 partial normalized errors defined as 
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Fig. 3—Block diagram of feature-measurement system. 
l Bis 
E(1) a et [1-k (7)], l= 1,2, 2 +12, (5) 
L= 


i.e., H(/) is the normalized error of an |-pole LPC analysis. Since the lattice 
method does not require the set of correlations directly, they are com- 
puted on the signal from the equation 


o(0,i) = SDs — i), b= 12 ee 19, (6) 
n=0 


l.e., a nonstationary correlation function is computed. The cepstral 
coefficients are computed directly from the LPC coefficients using the 
recursion relation 


; t-1 m 
c(t) =a(i)-— > |, clm)al —m), 1<:1< 12. (7) 
m=1 

Two other measurements are made directly on the signal s(n). These 
are the zero-crossing count defined as the number of zero crossings per 
10-ms interval, and a computation of the difference between the maxi- 
mum and minimum signal amplitudes in the frame. 

In addition to the above parameters, six additional measurements are 
made on the first difference of the signal, d(n), defined as 


d(n) = s(n) — s(n -— 1) (8) 
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Fig. 4—Frequency response of lowpass smoothing filter. 


and a smoothed version of the signal obtained via the filtering rela- 
tion* 
§(n) = —s(n) + s(n — 2) + 2s(n — 3) + 4s(n — 4) + 4s(n — 5) 

+ 4s(n — 6) + 2s(n — 7) +s(n — 8) —s(n—-10). (9) 


It can be seen that the filtering of eq (9) can be accomplished without 
the need for a multiplier and, thus, can be implemented quite efficiently. 
Figure 4 shows the frequency response of this filter. It can be seen that 
the filter provides a small amount of high-frequency attenuation and 
therefore can be considered as a lowpass smoothing filter. The mea- 
surements made on d(n) and §(n) are zero-crossing count, absolute en- 
ergy, and difference between maximum and minimum signal levels in 
the frame. 


“This filter as well as parameters 65-70 were suggested by D. R. Reddy for inclusion in 
this work. 
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SILENCE 





LOG ENERGY IN DECIBELS 


Fig. 5—Probability distributions for log energy of the signal for silence, unvoiced, and 
voiced classes. Both estimated and gaussian fits to the distributions are shown. 


Once the initial set of 70 parameters was chosen, a training set of data 
was used to estimate one-dimensional probability functions for each of 
the parameters and for each signal classification. A one-dimensional 
gaussian curve having the same mean and standard deviation as the 
measured distributions was also computed for each parameter:” Figures 
5 through 7 show three typical distributions for the parameters log energy 
(feature 61), first LPC coefficient (feature 1), and twelfth LPC coefficient 
(feature 12), respectively. For the log-energy parameter (Fig. 5), the 
distributions for silence, unvoiced, and voiced speech were fairly well 
separated with means of 18, 34, and 49 dB, respectively. Similarly the 
distributions for the first LPC coefficient (Fig. 6) were also well separated 
with means of —0.19, —0.66, and —1.9 for silence, unvoiced, and voiced 
speech, respectively. However, as shown in Fig. 7, the distributions for 
all parameters were not well separated across the different classes. In 
this case, the distributions for all three signal classes overlapped con- 
siderably. It seems reasonable that features in which such behavior is 
observed will not be effective in the classification procedure. Therefore, 


> For the distance metric used in this work, it is not critical that the one-dimensional 
distributions of the parameters be well approximated by a simple gaussian curve. It is 
important, however, that the distributions be unimodal. 
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Fig. 6—Probability distributions for first LPC coefficient for silence, unvoiced, and voiced 
classes. 


such parameters were not considered in the testing to be described in 
this paper. 

A total of 34 of the 70 parameters were eliminated in this manner. The 
parameters eliminated were the higher LPC coefficients [a(5) to a(12)], 
the higher autocorrelation coefficients [¢(0,5) to ¢(0,12)], the higher 
parcor coefficients [k(5) to k(12)], the higher cepstral coefficients [c (5) 
to c(12)], and the last two partial normalized LPC error coefficients 
[E(11) and E(12)]. The remaining 36 parameters were used in all the 
optimization tests described in the next section. 


2.2 Knockout optimization procedure 


To choose the set of five parameters out of the remaining 36 features 
that best (most accurately) classified signal intervals as silence, unvoiced, 
or voiced speech, a knockout optimization procedure was used.’ Figure 
8 shows a flow diagram of the procedure. Using a testing set of data (see 
Section 2.3) and an objective error measure, the knockout optimization 
proceeded first to find the single best parameter for separating the three 
classes. The best parameter is knocked out and used in combination with 
each of the remaining 35 features to find the best pair of parameters for 
the signal classification. This process of knocking out the best parameter 
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Fig. 7—Probability distributions for twelfth LPC coefficient for silence, unvoiced, and 
voiced classes. 


and combining all knocked-out features with the ones remaining in the 
parameter set was iterated until a total of five parameters were ob- 
tained. 

Several comments should be made about this procedure. First, it is 
noted that this method does not necessarily yield the optimum set of five 
parameters for making the silence, unvoiced, voiced decision. In general, 
the resulting parameter set is suboptimal since only a very small subset 
of the total number of combinations of 36 parameters taken five at a time 
are considered in this method. In defense of the method, however, one 
can argue that, within the constraints of the procedure, an optimal set 
of the 36 parameters is chosen. Furthermore, at least theoretically, the 
addition of each new knocked-out feature reduces the error score. Finally, 
it is argued that the resulting feature sets provide significantly better 
accuracy for signal classification than almost any randomly chosen set 
of five of the 36 parameters. 


2.3 Distance computation 


The distance computation used throughout this investigation was the 
non-Euclidean distance metric defined in Ref. 1. For the feature vector 
x = [x(1),x(2), .. ..x(5)] with mean vector m; = [m;(1),m,(2), . . ..m;(5)] 
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Fig. 8—Flow chart of knockout optimization algorithm. 


and covariance matrix A;, the distance computation was of the form 
d; = (x — m;)A;'(x — m;)¢, (10) 


where 1 = 1 (silence), 2 (unvoiced), or 3 (voiced), and t denotes the 
transpose of a vector. For each signal class, d; is computed and the de- 
cision rule is to select class i such that d; < d; for all 7 1;1.e., choose the 
class with the minimum distance to vector x. 

To implement the distance computation in eq. (10) during the 
knockout optimization required the computation of a new covariance 
matrix A; for each subset of parameters being considered. Thus, on the 
order of 420 covariance matrices had to be estimated in a typical opti- 
mization run. This represented a substantial amount of computation. 


2.4 Experimental procedure 


The formal evaluation of the feature sets was made by choosing a fairly 
large data base of different utterances and different speakers, using part 
of the data base for training the system, and using the remainder of the 
data base for testing the system. 
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A total of five speakers (two male, three female) were used in the 
telephone-line evaluation. Each speaker recited three utterances,* each 
one over a new dialed-up connection and thereby ensuring that a dif- 
ferent telephone-transmission path was obtained for each utterance. Two 
different telephone transmitters (carbon microphones) were also used 
in the test. One utterance from each speaker was used in the training set; 
the remaining two utterances were used in the testing set. 

For each recorded utterance, a manual analysis was performed on each 
10-ms interval to classify it as voiced, unvoiced, or silence based on both 
the acoustic waveform and a phonetic transcription of the utterance. 
Each signal classification was further modified with a label as to the 
certainty of the manual classification. The labels used were: 


(1) Absolutely certain—clear characteristics of the class to which it 
was assigned. 


(11) Moderately certain—generally a boundary interval between classes 
in which two types of signal were present. 

(iit) Uncertain—classified primarily on linguistic information about 
the utterance. Included in this class were voiced fricatives, voiced 
stops, and certain transients (including some telephone-line tran- 
sients). 


Figure 9 shows an example in which uncertain intervals occurred. This 
section of speech is from the beginning of the word cowboys. The initial 
intervals should linguistically be labelled as either silence or unvoiced 
speech corresponding to the stopgap and burst of the voiceless stop /k/. 
However, acoustically the initial seven intervals (as marked in Fig. 9) 
show properties more similar to voiced speech than to silence or unvoiced 
sounds. These intervals were treated as uncertain intervals and were 
marked as unvoiced speech for testing purposes. 

For the training set, only those intervals for which the classification 
was absolutely certain were used. For the testing set, three sets of data 
were used. One set contained only those intervals for which the classi- 
fication was absolutely certain (TS1). The second set contained both the 
moderately certain as well as the absolutely certain intervals (TS2). The 
third set contained all the intervals, regardless of the certainty of manual 
classification (TS3). In the next section, we present results for each of 
these testing sets of data. 


lll, RESULTS 


The knockout optimization procedure described in Section II was run 
on the three sets of test data using 10 different error-weighting matrices. 


* Each utterance was a carefully chosen sentence containing a mixture of voiced, un- 
voiced, and silence intervals. 
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Fig. 9—The acoustic waveform for a section of speech in which an interval was uncer- 
tain. 


In addition, the entire experiment was rerun on data preprocessed using 
the two-pole spectral normalization method described in Section II. 
Table I provides a summary of the three test sets of data, the 10 error- 
weight matrices, and the two processing conditions. 

The error-weight matrices were used to study the effects of weights 
for each type of classification error on the overall error rate and the choice 
of the optimal features. The definition of a general error-weight matrix 
is as follows. If we let E denote the overall error score in classifying the 
data of a test set, then 


E = Nss Wss + NsuWsu + Nev Wsv + IN ig Waa FN wi WV ins 
+ NuvWuv + NosWos + NouWou + NovWov, (1) 


where N,» is the number of frames of a class a which were classified as 
belonging to class b, and Way is the weight attached to this pair of clas- 
sifications. It should be clear from eq. (11) that 


Ns = Ness + Neu + Nov 
Nu = Nus + Nuu + Nuv (12) 
Ne= Nos + Nog Noo; 


where N, is the number of frames in the test set in class a. Table II shows 
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Table |— Summary of factors considered in the investigation 





Data Test Sets TS1 Absolutely certain intervals 
TS2 Moderately certain intervals added to TS1 
TS3 Uncertain intervals added to Ts2 


Error-Weight WM1_ Uniform matrix 

Matrices WM2 Silence weighting matrix 
WM3__—sUnvoiced weighting matrix 
WM4 “Voiced weighting matrix 
WM5__ Silence-to-unvoiced weighting matrix 
WMsé__— Unvoiced-to-silence weighting matrix 
WM7 _ Silence-to-voiced weighting matrix 
WMs__svVoiced-to-silence weighting matrix 
WM9__—*Voiced-to-unvoiced weighting matrix 

WM10_  Unvoiced-to-voiced weighting matrix 


Preprocessing Pi Direct transmission 
P2 ‘Two-pole spectral normalization 





the 10 weight matrices described in Table I. Each matrix is expressed 
in the form 


Wss W; u W, U 
Wus Wuu Wu , 

W= 13 
Wos Wou Ww 


where W,, is not generally the same as Wag. 

As seen in Table II, error weight-matrix 1 (WM1) attaches equal weight 
to all six types of misclassifications and, therefore, is the canonic error 
matrix for the three-class problem. Error matrices 2-4 (WM2-WM4) each 
choose a subset in which one of the three classes is essentially merged 
with another class. For example, error matrix 4 (WM4) gives 0 weight to 
errors between the classes of silence and unvoiced speech; however, the 
other four types of error have unity weight. Thus, this matrix serves to 
distinguish most effectively between voiced speech and nonvoiced (either 
silence or unvoiced) speech. As another example, error matrix 2 (WM2) 
gives 0 weight to errors between the classes of voiced and unvoiced 
speech. Thus, this matrix serves to distinguish between speech (voiced 
or unvoiced) and silence. As such, it would be useful for speech-detection 
applications. Error matrices 5 through 10 each focus on only one of the 
six sets of misclassifications. The results for these cases give a lower 
bound on the error rate for special cases in which only a single type of 
misclassification is considered. 

For each of the sets of data of Table I, the knockout optimization 
procedure was used giving the five best features and the resulting overall 
misclassification rate, defined as 


E 


Ey = —————_— 
v (N, + Ny + Ny) 


(14) 
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Table Il — Error-weight matrices used in the investigation 


0 1 1 0 1 1 0 1 0 
1 0 Il 1 O O 1 O l 
1 1 O 1 0 0O 0 1 0 
WMI1 WM2 WM3 
(a) (b) (c) 
001 01 0 00 0 
0 0 I 0 0 0 1 0 O 
1 1 QO 0 0 0 0 0 0 
WM4 WM5 WM6 
(d) (e) (f) 
0 0 1 0 0 0 0 0 0 
0 0 O} 0 0 0 0 0 0 
0 0 0 1 0 0O 0 1 0 
WM7 WMs8s WM9 
(g) (h) (i) 


Wee 
Soo Ss 
Soo 
OHS 
ae 


WM10 
(3) 


For error weights 5 through 10 (where only a single misclassification was 
counted) the overall misclassification rated was defined as 


E 


En = Nn,’ (15) 


where 


N,; for WM5 and WM7 
N,=/N, for WMé6 and WMi10 - (16) 
N, for WMs and wM9 


The results of these experiments are presented in Tables III through VI. 
Tables IV through VI present the misclassification rate results, and 
Table III gives both the parameter numbers and the mnemonics of the 
five parameters chosen by the optimization procedure. The results in 
these tables are presented sequentially; i.e., the results obtained using 
only / of the five parameters (/ = 1,2,3,4) are indicated in the appropriate 
rows of the tables. 

Two comments should be made about the data. In many cases, it was 
found that the overall misclassification rate did not monotonically de- 
crease as more features were knocked out of the parameter set. For these 
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Test 
Set 


TS1 


TS2 


TS3 


Parameter 


ORWNe OhohH 


OT GN 


* Other features provided the same overall misclassification rate. 


Table IIl —- Optimal features chosen by the knockout optimization for telephone inputs. 
Weight Matrix 
WMI WM2 WM3 WM4 WM5 WM6 WM7 WM8 WMg 
61-LE 65-ED 14-4(0,2) 63-LNE 65-ED 70-MLS* 67-MLD  68-ES* 63-LNE 
14-6(0,2)  26-k(2) 68-ES 44-F(8)  27-k(3) 2-a(2) 
63-LNE —_16-4(0,4) 69-NZS __68-ES 67-MLD* 68-ES 
‘67-MLD 4-a(4) 65-ED 
64-ML 27-k(3)* 
61-LE 65-ED 14-4(0,2) 63-LNE 65-ED 64-ML 67-MLD  68-ES* 63-LNE 
50-c(2) 26-k(2) 68-ES 66-NZD = 27-k(3) 70-MLS* 2-a(2) 
16-¢(0,4) 69-NZS_  45-E(9) 67-MLD* 68-ES 
70-MLS 65-ED 
67-MLD 27-k(3)* 
61-LE 65-ED 14-4(0,2) 63-LNE  65-ED 50-c(2) 67-MLD  68-ES* 14-(0,2) 
50-c(2) 26-k(2) 68-ES 66-NZD = 27-k(3) 52-c(4) 43-E(7)* _70-MLS 
16-(0,4) 69-NZS 45-E(9) + 67-MLD* __68-ES* 63-LNE 
3-a(3) 70-MLS 68-ES 
64-ML 61-LE 


WM10 


46-E(10) 
42-E(6) 
43-15(7)* 


46-E(10) 
42-E(6) 
62-NZ 
25-k(1) 
45-E(9)* 


46-E(10) 
42-E(6) 
40-E(4) 
43-E(7) 


66-NZD 
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Table IV — Error rates for telephone inputs 


Weight Matrix 
Parameter WM1 WM2 WM3 WM4 WM5 WM6 WM7 WMs WM9 WM10 


TS1 Without Two-Pole Spectral Normalization 





1 15.1 4.7 9.8 2.8 2.4 0.7 24 0 0.9 1.0 
2 11.0 4.7 5.7 2.3 0.6 0 0.5 0.7 
3 7.5 4.4 5.3 1.9 0 0.4 0.3 
4 6.9 1.9 0.3 
5 6.4 
Weight Matrix 
Parameter WM1 WM? WM3 WM4 WM5 WM6 WM7 WMs8 WM9 WM10 
TS1 With Two-Pole Spectral Normalization 
1 13.8 5.1 11.0 5,1 2.4 4.6 2.4 0 3.2 6.8 
2 9.7 7.1 4.5 0.7 BZ 2.1 
3 8.6 6.0 4.4 2.1 2.3 
4 7.6 6.0 3.9 1.9 
5 7.6 1.8 
Size of Training and Testing 
Sets for TS1 
Number of Frames 
Training Testin 
S 207 328 
U 210 306 
V 539 1180 
956 1834 


LEAD ECR nO de eee Pale ey A RA eer 





Table V—Error Rates for Telephone Line Inputs 


Weight Matrix 
Parameter WM1 WM2 WM3 WM4 WM5 WM6 WM7 WM8 WM9 WM10 
TS2 without two-pole Spectral Normalization 


1 16.2 5.8 12.2 4.5 2.4 0.5 2.6 0 1.8 2.7 
2 11.7 5.4 7.4 4.3 0.8 0 1.0 2.1 
3 10.8 7.1 3.6 0 0.7 13 
4 3.5 0.5 1.1 
5 3.5 

Weight Matrix 


Parameter WM1 WM2 NM3 WM4 WM5 WM6 WM7 WM8 WM9 WM10 
TS2 with two-pole Spectral Normalization 


1 18.1 7.2 14.4 8.6 29 3.9 an | 0.3 4.5 9.5 
2 13.4 10.4 8.2 1.1 0 5.2 6.1 
3 12.4 94 7.8 3.9 5.0 
4 11.6 7.4 3.8 4.2 
5) 11.5 ee 3.6 3.7 


Size of Training and Testing Sets for TS2 
Number of Frames 


Training Testing 
S 207 375 
U 210 378 
V 539 1196 


956 1949 


cases data are presented up to the number of parameters at which the 
error rate kept decreasing. The second comment concerns the specific 
features knocked out in the optimization (as given in Table III). In many 
cases, a large number of features (other than the ones presented) pro- 
vided essentially the same overall misclassification rate as the feature 
that was knocked out. These cases are indicated by an asterisk after the 
feature number in these tables. For such cases, features other than the 
ones indicated in the table may be equally appropriate. 


IV. ANALYSIS OF THE RESULTS 


Several important observations can be made by examining carefully 
the results of Tables III though VI. First, it can be seen by comparing 
error rates for matrix WM1 to those for matrices WM2 through WM4 that 
most of the overall error rate for the canonic error matrix was due to 
misclassifications between the classes of silence and unvoiced speech* 
(compare results for WM1 and WMs3). This result is certainly not unan- 


* Further evidence of this result is given in Table VII, which shows a breakdown of the 
error components. This table is discussed later in this section. 
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Table V! — Error rates for telephone line inputs 


Weight Matrix 
Parameter WM1 wM2 WM3 WM4 WM5 WM6 = —-WM7 WM8 WM9 WM10 


TS3 Without Two-Pole Spectral Normalization 





1 18.5 6.3 13.4 6.3 2.4 0.9 24 Q 3.3 5.9 
2 13.2 5.6 9.0 6.1 0.8 0.2 2.1 0.4 
3 12.2 8.7 6.1 0 0 1.3 5.0 
4 12.0 8.7 5.2 1.1 4.1 
5 Lit 5.0 3.4 
Weight Matrix 
Parameter WM1 #£.WM2 WMs3 WM4 WM5 WMg6 WM7 WMs8 WM9 WM10 
TS3 With Two-Pole Spectral Normalization 
1 20.3 8.2 16.2 10.3 2.9 4.5 29 0.2 4.9 12.5 
2 15.6 | 12.2 9.5 1.3 0 5.9 14.3 
3 14.3 11.2 9.1 4.8 11.4 
4 13.7 8.7 4.5 9.8 
5 13.4 8.5 4,4 8.7 


em eee 
Size of Training and Testing Sets for TS3 


Number of Frames 


S 207 379 
U 210 443 
Vs 839 1264 

956 2086 


I 


ticipated since the band limiting of telephone speech has the most severe 
effect on unvoiced sounds whose spectral components often fall above 
the high-frequency cutoff of the telephone transmission. 

Based on the above result it would seem reasonable to compare error 
scores using error matrices 1 and 4. It can be seen from the tables that 
if one does not consider distinctions between silence and unvoiced 
speech, then an improvement of somewhat more than 2 to 1 in error score 
is obtained. For the case of absolutely certain classifications, an error 
rate of 1.9 percent is obtained for error matrix 4. For test sets TS2 and 
TS3, the error rate for error matrix 4 increases to 3.5 and 5.0 percent, 
respectively. 

The results using error matrix 2 (the speech detection matrix) show 
that, in the case of absolutely certain classifications (Table IV) an error 
rate of 4.3 percent is obtained. For test sets TS2 and TS3, the error rate 
for matrix 2 increases slightly to 5.4 and 5.6 percent, respectively. 

The results of using error matrices WM5-WM10 show that the most 
frequent misclassification occurs between silence and voiced speech in 
which error rates on the order of 2 to 3 percent were obtained for all three 
test cases. The problem here occurs during low-level sounds, such as 
voiced stops where the silence regions are often classified as voiced due 
to the presence of low-frequency components of the signal. Unfortu- 
nately, such signals do not fall neatly into either category and the deci- 
sion algorithm consistently classified them as voiced sounds whereas the 
manual classification was silence. 

Comparisons of the results of Tables IV, V, and VI showed that the 
error scores increased with the complexity of the test set as anticipated. 
However, it is difficult to attach too much meaning to the absolute error 
rates for TS2 and TS3, since the frames which were added constituted 
boundary frames and frames which were subject to classification error 
in the manual classification. The results are presented to provide in- 
formation as to the sizes of the increases in error rate that are to be ex- 
pected with such input test sets. 

The data of Table III (the optimal feature list) are also quite inter- 
esting. The influence of the weight matrix is evident by scanning across 
the rows of the table. Each weight matrix had its own set of optimal 
features, which were different from those of any other weight matrix. 
By scanning down the columns of this table, however, it is seen that the 
influence of the data test set was fairly weak in that the optimal-feature 
set remained substantially the same for all three test sets across the three 
sets of data. 

An interesting result shown in Tables IV through VI is that the two- 
pole normalization scheme did not provide essentially any improvement 
in the classification accuracy across any of the test conditions studied. 
This result is a little surprising in light of the work of Itakura who found 
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that it compensated different telephone transmission conditions quite 
adequately. One possible reason for this result is that the non-Euclidean 
distance metric to some extent compensates automatically for the 
variable telephone transmission conditions by appropriate linear 
transformation of the feature space. Thus, for this classification method, 
the use of a two-pole spectral normalization is of little value. 

An additional breakdown of the error analysis for the most important 
error-weight matrices (WM1, WM2, and WM4) is given in Table VII in 
which the percentage of each type of misclassification is presented. It 
can be seen in this table that certain types of errors dominated the scores. 
For example, no cases occurred throughout the test in which a voiced 
interval was classified as silence. It can also be seen that, as mentioned 
previously, the error rate for silence-to-unvoiced speech dominated the 
overall error rates for error matrices WM1 and WM2, whereas no single 
component of the error dominated the overall error rate for matrix 
WM4. 


4.1 Comparison with wideband results 


Although some numerical scores were presented in Ref. 1 for mis- 
classification rates using the analysis method on wideband (high-quality) 
data, aset of companion results were obtained in this study to compare 
and contrast the error results for wideband and telephone signals. Using 
the identical procedures discussed in Sections II and III, a set of optimal 
features and error rates were obtained for wideband test sets of signals. 
The results of these runs are presented in Tables VIII and IX. Com- 
parisons of the error rate tables (VII and VIII) show the following: 


(1) For error weight matrix WM1, the scores for wideband data were 
from two to three times lower than for telephone data. This is due 
to the vastly improved scores on the category of silence-to-unvoiced 
errors. The error rates for many of the other possible misclassifi- 
cations were quite comparable. 


(it) For error weight WM4, the scores for wideband data were only 
slightly better than for telephone data, indicating that a voiced-not 
voiced decision can be as reliably made over a telephone line as for 
high-quality inputs. However, the speech—not speech decision is 
much more difficult for telephone data than for wideband sig- 
nals. 


(111) For error weight matrix WMg2, the scores for wideband data were 
from two to eight times lower than for telephone data. This result 
is again due to the improved performance in discriminating between 
silence and unvoiced speech for wideband data. 
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Table VIl — Breakdown of error percentages for telephone line inputs 
Error-Weight Matrix WM1 
Test 
Set SU SV US UV VS VU S U V Overall 
TS1 16.8 6.1 4.6 5.2 0 0.9 22.9 9.9 0.9 6.4 
TS2 30.1 5.9 2.9 5.0 0 4.1 36.0 8.0 4,1 11.0 
TS3 28.8 5.8 2.9 15.4 0 2.5 34.6 18.3 2.5 11.7 
Error-Weight Matrix WM4 
Test 
Set SU* SV US* UV VS VU Sts U** V Overall 
TS1 93.0 4.6 0 3.0 0 0.9 97.6 3.0 0.9 1.9 
TS2 88.9 6.7 0.3 6.1 0 1.6 95.7 6.4 1.6 3.5 
TS3 43.1 6.1 32 8.6 0 3.4 49.2 11.8 3.4 5.0 
Test Error-Weight Matrix WM2 
es 
Set SU SV US UV* VS VU* S U** yrs | Overall 
TS1 11.6 5.2 7.6 17.5 0.1 4,7 16.8 25.1 4.8 4.4 
TS2 14.8 5.4 7.5 27.7 0.2 7.0 20.2 35.2 i be 5.4 
TS3 15.4 5.6 19 27.6 0.2 9.4 21.0 35.5 9.5 5.6 


* These results had 0 weight in the overall error score and, therefore, did not affect the choice of features. 
** Only a single component of this error score is included in the overall score. 
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Table VIIl — Breakdown of error percentages for wideband inputs 


Error-Weight Matrix WM1 








Test Set SU SV US UV VS VU S U V Overall 
TS1 23 0 3.9 2.6 0 1.5 23 6.6 1.5 22 
TS2 8.7 5.8 5.7 6.7 0.1 2.3 14.5 12.4 2.4 5.3 
TS3 8.9 5.9 ee 6.8 0.1 23 14.8 14.0 2.5 5.7 

Error- Weight Matrix WM4 

Test Set SU* SV US* UV VS VU role U V Overall 
TS1 29.5 0 5.3 2.5 0 0.9 29.5 7.9 0.9 1.1 
TS2 39.9 5.8 4.8 9.1 0 1.4 45.7 13.9 1.4 3.1 
TS3 33.3 5.9 5.0 5.9 0 2.0 39.3 10.9 2.0 3.1 

Error-Weight Matrix WM2 

Test Set SU SV US UvV* VS VU* S Ue Ver Overall 
TS1 3.4 0 2.0 2.0 0 2.0 3.4 4.0 2.0 0.5 
TS2 7.3 7.3 4,3 11.0 0 | 14.5 15.3 2.7 23 
TS3 7.4 7.4 5.9 11.3 0 3.0 14.8 17.2 3.0 2.6 





* These results had 0 weight in the overall error score and therefore did not affect the choice of features. 
** Only a single component of this error score is included in the overall score. 


Table IX — Optimal features for wideband test sets and error- 
weight matrices WM1, WM4, and wmM2 


i aS tt ee 
Test Set WMI1 WM4 WM2 


ee eg ae 8. 

TS1 13 (0,1) 68 ES 64 ML 
70 MLS 95 k(1) 70 MLS 
14 (0,2) 69* NZS 61 LE 
61 LE 67* MLD 14 (0,2) 
68* ES 68 ES 

TS2 13 (0,1) 68 ES 64 ML 
70 MLS 95 (1) 70 MLS 
14 (0,2) 16 (0,4) 61 LE 
61 LE 64 ML 68 ES 
68 ES 67 MLD 

TS3 13 (0,1) 68 ES 64 ML 
70 MLS 15 (0,3) 70 MLS 
14 (0,2) 26 k(2) 61 LE 
61 LE 13 (0,1) 68 ES 
68 ES 52 (4) 67 MLD 


4.2 Typical test example 


Figures 10 and 11 show the results of applying the classification 
method to the utterance, “Few thieves are never sent to the jug,” spoken 
by a male speaker. The contour shown in (a) of each figure is a manual 
classification of each frame. Part (b) shows the results of analysis using 
parameters obtained from WM1 (Fig. 10) and WM4 (Fig. 11). Part (c) 
shows the results of nonlinearly smoothing the analysis contours using 
a median smoother.® Parts (d), (e), and (f) show plots of the probability 
of correct classification based on the distance calculation for each class; 
1.e., if we denote the distance calculated for silence as D,, the distance 
calculated for unvoiced as D,,, and the distance calculated for voiced as 
D,, then 


D,,D 
a eet eerie 20S | ee 17 
BO) D,Dy + DsDy + DyDe ae 
=e Ce Sea RPP 18 
wd) D,D, + D.D, + DyD»y we) 
D,;Dy 
= tt 19 
es D,D, + DsD, + DyDe ty?) 


It can be seen that P(S), P(U), and P(V) define a probability measure, 
since 


P(s) + P(u) + P(v) = 1 (20) 
and | 
0 <= P(s), P(u), P(v) <1 (21) 
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V HAND ANALYSIS 
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(f) 
0 
0 50 100 150 200 


FEW THIEVES ARE NEVER SENT TO THE JUG 


Fig. 10—The analysis results for the utterance, “Few Thieves are Never Sent to the Jug,” 
using optimal features from TS1 with weight matrix WM1. 


for all values of D,, D,,, and D,. Furthermore, the probabilities satisfy 
the relation 


lim P(a) > 1 (22) 
Da—0 
and 
lim P(a) — 0. (23) 
Da 


Thus, as the distance increases, the probability measures decrease. 
Contrasting the silence-unvoiced-voiced contours of Figs. 10 and 11, 
the following observations can be made: 


(i) The results obtained using features derived from matrix WM4 es- 
sentially never classified frames as silence. Instead all silence frames 


- SPEECH EVALUATION 479 


HAND ANALYSIS 


Vv 
U 
(a) 
S 
V RAW DATA 
U 
(b) 
S 
y SMOOTHED 
U 
c) 
: ( 
1 P (SILENCE) 
| | y (d) 
0 
P (UNVOICED) 
1 
(c) 
0 
P (VOICED) 
(f) 
0 
0 50 100 150 200 


FEW THIEVES ARE NEVER SENT TO THE JUG 


Fig. 11—The analysis results for the same utterance as Fig. 10 using optimal features from 
TS1 with weight matrix WM4. 


were classified as unvoiced, consistent with the zero weight given 
to this type of error. 


(11) Both sets of results contain only a small number of misclassifica- 
tions of voiced intervals. All but one of these voiced misclassifica- 
tions occurred at boundaries between voiced and nonvoiced 
speech. 


(ii) The probability measures for voiced speech using features derived 
from WM4 were somewhat higher throughout the voiced regions 
than corresponding results derived from WM1 features. This indi- 
cates that a somewhat better feature set for voiced sounds is ob- 
tained at the tradeoff of the high error rate for silence-to-unvoiced 
errors (and vice versa). 


Results similar to those discussed above have been obtained for a wide 
variety of utterances tested on the system using these sets of features. 
It is concluded that if one is willing to forego any attempt at distin- 
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suishing between unvoiced sounds and silence, then reliable voiced- 
nonvoiced decisions can be obtained over telephone lines. 


V. SUMMARY 


Through a series of fairly extensive tests, we have investigated quite 
thoroughly the potential of a fairly sophisticated silence-unvoiced-voiced 
classification system. We have shown that, depending on the weight 
attached to various types of misclassifications, a set of optimal features 
can be found that minimizes the weighted misclassification error rate. 
For telephone line inputs, the results showed that reliable discrimination 
between silence and unvoiced sounds is quite difficult; however, reliable 
discrimination between voiced and nonvoiced sounds (silence or un- 
voiced speech) can be achieved at error rates fairly close to those obtained 
with wideband input signals. 

Extensive testing of the optimal feature sets obtained from the analysis 
showed the method to be reliable enough for use in several typical ap- 
plications in the area of man-machine communication by voice.®1° 

One aspect of the analysis system which was not varied was the dis- 
tance metric used in the final classification. Although the non-Euclidean 
distance metric is a very powerful one for the features studied, other 
distance metrics have been proposed based on fixed parameter sets, such 
as the LPC parameters, etc.*:1! Investigations into the applicability of 
such distance metrics to the silence-unvoiced-voiced classification 
problem are currently in progress. 
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